Submissions from baseten.co

		Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs (baseten.co)
		247 points by philipkiely 40 days ago \| past \| 175 comments
		Continuous vs. dynamic batching for AI inference (baseten.co)
		1 point by aaronng91 40 days ago \| past
		A guide to LLM inference and performance (baseten.co)
		1 point by skidrow 7 months ago \| past
		Deploying custom ComfyUI workflows as APIs (baseten.co)
		1 point by AnhTho_FR 9 months ago \| past
		How to build function calling and JSON mode for open-source and fine-tuned LLMs (baseten.co)
		1 point by philipkiely on Sept 12, 2024 \| past
		How to double tokens per second for Llama 3 with Medusa (baseten.co)
		2 points by philipkiely on Aug 20, 2024 \| past
		Show HN: Automatically Build Nvidia TRT-LLM Engines (baseten.co)
		2 points by mikejulietbravo on Aug 1, 2024 \| past
		Show HN: 60% higher tokens per second for 70B custom LLMs (baseten.co)
		1 point by mikejulietbravo on July 31, 2024 \| past
		Show HN: Baseten Chains – Framework and SDK for Multi-Model AI Products (baseten.co)
		9 points by mikejulietbravo on June 27, 2024 \| past \| 5 comments
		Open Source Inference Engine Baseten Raises $40M from IVP, Spark and Greylock (baseten.co)
		2 points by mikejulietbravo on March 14, 2024 \| past \| 1 comment
		FP8: Efficient model inference with 8-bit floating point numbers (baseten.co)
		2 points by philipkiely on March 8, 2024 \| past
		Introduction to quantizing machine learning models (baseten.co)
		1 point by tuhins on Feb 16, 2024 \| past
		Faster Mixtral inference with TensorRT-LLM and quantization (baseten.co)
		2 points by tikkun on Dec 27, 2023 \| past \| 1 comment
		A guide to open-source LLM inference and performance (baseten.co)
		113 points by varunshenoy on Nov 20, 2023 \| past \| 14 comments
		How we got Stable Diffusion XL inference to under 2 seconds (baseten.co)
		51 points by varunshenoy on Aug 31, 2023 \| past \| 5 comments
		SDXL inference in under 2 seconds (baseten.co)
		3 points by tuhins on Aug 31, 2023 \| past \| 1 comment
		Three techniques to adapt LLMs for any use case (baseten.co)
		1 point by philipkiely on June 15, 2023 \| past
		Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA (baseten.co)
		402 points by aaronrelph on March 22, 2023 \| past \| 215 comments
		Show HN: Fine-tune generative models in 1 line of code (baseten.co)
		16 points by aqader on March 1, 2023 \| past
		Serving four million Riffusion requests in two days (baseten.co)
		5 points by philipkiely on Dec 21, 2022 \| past
		Accelerating model deployment: 100X faster dev loops with draft models (baseten.co)
		1 point by tuhins on Dec 9, 2022 \| past
		Show HN: Free Stable Diffusion 2.0 hosted interface (baseten.co)
		25 points by philipkiely on Nov 24, 2022 \| past \| 2 comments
		Try it yourself: Speech to text with Whisper (baseten.co)
		5 points by philipkiely on Oct 1, 2022 \| past
		Deploying Stable Diffusion in Production Using Truss (baseten.co)
		3 points by philipkiely on Sept 1, 2022 \| past
		Hosted Stable Diffusion Demo (baseten.co)
		7 points by philipkiely on Aug 24, 2022 \| past
		Code generation interactive demo (Salesforce Codegen mono 2B) (baseten.co)
		2 points by philipkiely on July 1, 2022 \| past
		DALL-E Mini – Generate images from a text prompt (baseten.co)
		52 points by tuhins on June 10, 2022 \| past \| 22 comments
		Demo – Text generation with EleutherAI's GPT-J-6B model (baseten.co)
		1 point by tuhins on April 29, 2022 \| past
		Show HN: Baseten – Build ML-powered applications (baseten.co)
		112 points by philipkiely on April 26, 2022 \| past \| 11 comments
		How BaseTen is using “docs as code” (baseten.co)
		5 points by philipkiely on March 9, 2022 \| past
		More