Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
The Illusion of Readiness: Stress Testing Frontier Models on Medical Benchmarks (arxiv.org)
6 points by mellosouls 15 days ago | past
Report on the 63rd Annual International Mathematical Olympiad (arxiv.org)
1 point by bikenaga 15 days ago | past
A fast, strong, topologically meaningful and fun knot invariant (arxiv.org)
52 points by bikenaga 15 days ago | past | 7 comments
Quantized LLMss in Biomedical Natural Language Processing (arxiv.org)
1 point by PaulHoule 15 days ago | past
Ransomware 3.0: Self-Composing and LLM-Orchestrated (arxiv.org)
1 point by PaulHoule 15 days ago | past
Multi-Modal vs. Text-Based: Benchmarking LLM Strategies for Invoice Processing (arxiv.org)
1 point by PaulHoule 15 days ago | past
LIMI: Less Is More for Agency (arxiv.org)
1 point by pella 16 days ago | past
Design, analysis, and manufacturing of microstructured blade-like geometries (arxiv.org)
2 points by PaulHoule 16 days ago | past
Fill probability estimates in institutional bond trading with quantum computers (arxiv.org)
2 points by polrjoy 16 days ago | past | 2 comments
Weak Memory Model Formalisms: Introduction and Survey (arxiv.org)
2 points by matt_d 16 days ago | past
Why Language Models Hallucinate (arxiv.org)
1 point by ummonk 16 days ago | past
GPU Implementation of Second-Order Linear and Nonlinear Programming Solvers (arxiv.org)
1 point by adgjlsfhk1 16 days ago | past | 1 comment
Bluffing in Scrabble (arxiv.org)
8 points by fanf2 16 days ago | past
Opal: An Operator Algebra View of RLHF (arxiv.org)
2 points by P_qRs 16 days ago | past
Effects of the entropy source on Monte Carlo simulations (arxiv.org)
2 points by bob1029 16 days ago | past
Enabling an Ecosystem of Personalized and Interoperable Social Applications (arxiv.org)
2 points by sportdeath 17 days ago | past
Space Mission Options for Reconnaissance and Mitigation of Asteroid 2024 YR4 [pdf] (arxiv.org)
2 points by croes 17 days ago | past
Discrete Diffusion in Large Language and Multimodal Models: A Survey (arxiv.org)
2 points by NeoInHacker 17 days ago | past
Personalised Pricing: The Demise of the Fixed Price? (arxiv.org)
2 points by Hard_Space 17 days ago | past
OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection (arxiv.org)
4 points by pykello 17 days ago | past
Space Mission Options for Mitigation of Asteroid 2024 YR4 (arxiv.org)
4 points by geox 17 days ago | past
DeepMind Paper on Virtual Agent Economies (arxiv.org)
2 points by nanfinitum 17 days ago | past
Seeing Is Deceiving:Mirror-Based Lidar Spoofing for Autonomous Vehicle Deception (arxiv.org)
1 point by bikenaga 17 days ago | past
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs (arxiv.org)
1 point by mathattack 17 days ago | past
Are elites meritocratic and efficiency-seeking? Evidence from MBA students (arxiv.org)
103 points by bikenaga 17 days ago | past | 73 comments
Pre-training under infinite compute (arxiv.org)
3 points by jonbaer 17 days ago | past
Hyb Error: A Hybrid Metric Combining Absolute and Relative Errors (2024) (arxiv.org)
19 points by ncruces 18 days ago | past | 2 comments
The illusion of diminishing returns in LLM progress (arxiv.org)
3 points by SCEtoAux 18 days ago | past
Learn Your Way: Towards an AI-Augmented Textbook, Google Research (arxiv.org)
3 points by walterbell 18 days ago | past
Wan-Animate: Unified Character Animation, Replacement with Holistic Replication (arxiv.org)
2 points by walterbell 18 days ago | past

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: