Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs (baseten.co)
247 points by philipkiely 40 days ago | past | 175 comments
Continuous vs. dynamic batching for AI inference (baseten.co)
1 point by aaronng91 40 days ago | past
A guide to LLM inference and performance (baseten.co)
1 point by skidrow 7 months ago | past
Deploying custom ComfyUI workflows as APIs (baseten.co)
1 point by AnhTho_FR 9 months ago | past
How to build function calling and JSON mode for open-source and fine-tuned LLMs (baseten.co)
1 point by philipkiely on Sept 12, 2024 | past
How to double tokens per second for Llama 3 with Medusa (baseten.co)
2 points by philipkiely on Aug 20, 2024 | past
Show HN: Automatically Build Nvidia TRT-LLM Engines (baseten.co)
2 points by mikejulietbravo on Aug 1, 2024 | past
Show HN: 60% higher tokens per second for 70B custom LLMs (baseten.co)
1 point by mikejulietbravo on July 31, 2024 | past
Show HN: Baseten Chains – Framework and SDK for Multi-Model AI Products (baseten.co)
9 points by mikejulietbravo on June 27, 2024 | past | 5 comments
Open Source Inference Engine Baseten Raises $40M from IVP, Spark and Greylock (baseten.co)
2 points by mikejulietbravo on March 14, 2024 | past | 1 comment
FP8: Efficient model inference with 8-bit floating point numbers (baseten.co)
2 points by philipkiely on March 8, 2024 | past
Introduction to quantizing machine learning models (baseten.co)
1 point by tuhins on Feb 16, 2024 | past
Faster Mixtral inference with TensorRT-LLM and quantization (baseten.co)
2 points by tikkun on Dec 27, 2023 | past | 1 comment
A guide to open-source LLM inference and performance (baseten.co)
113 points by varunshenoy on Nov 20, 2023 | past | 14 comments
How we got Stable Diffusion XL inference to under 2 seconds (baseten.co)
51 points by varunshenoy on Aug 31, 2023 | past | 5 comments
SDXL inference in under 2 seconds (baseten.co)
3 points by tuhins on Aug 31, 2023 | past | 1 comment
Three techniques to adapt LLMs for any use case (baseten.co)
1 point by philipkiely on June 15, 2023 | past
Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA (baseten.co)
402 points by aaronrelph on March 22, 2023 | past | 215 comments
Show HN: Fine-tune generative models in 1 line of code (baseten.co)
16 points by aqader on March 1, 2023 | past
Serving four million Riffusion requests in two days (baseten.co)
5 points by philipkiely on Dec 21, 2022 | past
Accelerating model deployment: 100X faster dev loops with draft models (baseten.co)
1 point by tuhins on Dec 9, 2022 | past
Show HN: Free Stable Diffusion 2.0 hosted interface (baseten.co)
25 points by philipkiely on Nov 24, 2022 | past | 2 comments
Try it yourself: Speech to text with Whisper (baseten.co)
5 points by philipkiely on Oct 1, 2022 | past
Deploying Stable Diffusion in Production Using Truss (baseten.co)
3 points by philipkiely on Sept 1, 2022 | past
Hosted Stable Diffusion Demo (baseten.co)
7 points by philipkiely on Aug 24, 2022 | past
Code generation interactive demo (Salesforce Codegen mono 2B) (baseten.co)
2 points by philipkiely on July 1, 2022 | past
DALL-E Mini – Generate images from a text prompt (baseten.co)
52 points by tuhins on June 10, 2022 | past | 22 comments
Demo – Text generation with EleutherAI's GPT-J-6B model (baseten.co)
1 point by tuhins on April 29, 2022 | past
Show HN: Baseten – Build ML-powered applications (baseten.co)
112 points by philipkiely on April 26, 2022 | past | 11 comments
How BaseTen is using “docs as code” (baseten.co)
5 points by philipkiely on March 9, 2022 | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: