| | Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs (baseten.co) |
| 247 points by philipkiely 40 days ago | past | 175 comments |
|
| | Continuous vs. dynamic batching for AI inference (baseten.co) |
| 1 point by aaronng91 40 days ago | past |
|
| | A guide to LLM inference and performance (baseten.co) |
| 1 point by skidrow 7 months ago | past |
|
| | Deploying custom ComfyUI workflows as APIs (baseten.co) |
| 1 point by AnhTho_FR 9 months ago | past |
|
| | How to build function calling and JSON mode for open-source and fine-tuned LLMs (baseten.co) |
| 1 point by philipkiely on Sept 12, 2024 | past |
|
| | How to double tokens per second for Llama 3 with Medusa (baseten.co) |
| 2 points by philipkiely on Aug 20, 2024 | past |
|
| | Show HN: Automatically Build Nvidia TRT-LLM Engines (baseten.co) |
| 2 points by mikejulietbravo on Aug 1, 2024 | past |
|
| | Show HN: 60% higher tokens per second for 70B custom LLMs (baseten.co) |
| 1 point by mikejulietbravo on July 31, 2024 | past |
|
| | Show HN: Baseten Chains – Framework and SDK for Multi-Model AI Products (baseten.co) |
| 9 points by mikejulietbravo on June 27, 2024 | past | 5 comments |
|
| | Open Source Inference Engine Baseten Raises $40M from IVP, Spark and Greylock (baseten.co) |
| 2 points by mikejulietbravo on March 14, 2024 | past | 1 comment |
|
| | FP8: Efficient model inference with 8-bit floating point numbers (baseten.co) |
| 2 points by philipkiely on March 8, 2024 | past |
|
| | Introduction to quantizing machine learning models (baseten.co) |
| 1 point by tuhins on Feb 16, 2024 | past |
|
| | Faster Mixtral inference with TensorRT-LLM and quantization (baseten.co) |
| 2 points by tikkun on Dec 27, 2023 | past | 1 comment |
|
| | A guide to open-source LLM inference and performance (baseten.co) |
| 113 points by varunshenoy on Nov 20, 2023 | past | 14 comments |
|
| | How we got Stable Diffusion XL inference to under 2 seconds (baseten.co) |
| 51 points by varunshenoy on Aug 31, 2023 | past | 5 comments |
|
| | SDXL inference in under 2 seconds (baseten.co) |
| 3 points by tuhins on Aug 31, 2023 | past | 1 comment |
|
| | Three techniques to adapt LLMs for any use case (baseten.co) |
| 1 point by philipkiely on June 15, 2023 | past |
|
| | Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA (baseten.co) |
| 402 points by aaronrelph on March 22, 2023 | past | 215 comments |
|
| | Show HN: Fine-tune generative models in 1 line of code (baseten.co) |
| 16 points by aqader on March 1, 2023 | past |
|
| | Serving four million Riffusion requests in two days (baseten.co) |
| 5 points by philipkiely on Dec 21, 2022 | past |
|
| | Accelerating model deployment: 100X faster dev loops with draft models (baseten.co) |
| 1 point by tuhins on Dec 9, 2022 | past |
|
| | Show HN: Free Stable Diffusion 2.0 hosted interface (baseten.co) |
| 25 points by philipkiely on Nov 24, 2022 | past | 2 comments |
|
| | Try it yourself: Speech to text with Whisper (baseten.co) |
| 5 points by philipkiely on Oct 1, 2022 | past |
|
| | Deploying Stable Diffusion in Production Using Truss (baseten.co) |
| 3 points by philipkiely on Sept 1, 2022 | past |
|
| | Hosted Stable Diffusion Demo (baseten.co) |
| 7 points by philipkiely on Aug 24, 2022 | past |
|
| | Code generation interactive demo (Salesforce Codegen mono 2B) (baseten.co) |
| 2 points by philipkiely on July 1, 2022 | past |
|
| | DALL-E Mini – Generate images from a text prompt (baseten.co) |
| 52 points by tuhins on June 10, 2022 | past | 22 comments |
|
| | Demo – Text generation with EleutherAI's GPT-J-6B model (baseten.co) |
| 1 point by tuhins on April 29, 2022 | past |
|
| | Show HN: Baseten – Build ML-powered applications (baseten.co) |
| 112 points by philipkiely on April 26, 2022 | past | 11 comments |
|
| | How BaseTen is using “docs as code” (baseten.co) |
| 5 points by philipkiely on March 9, 2022 | past |
|
|
| More |