Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Biases in the Blind Spot: Detecting What LLMs Fail to Mention (arxiv.org)
1 point by mpweiher 3 hours ago | past | discuss
A Framework for Time-Updating Probabilistic Forecasts (arxiv.org)
5 points by Luc 10 hours ago | past | discuss
Towards Autonomous Mathematics Research (Google DeepMind) (arxiv.org)
1 point by u1hcw9nx 11 hours ago | past | discuss
Remote Labor Index: Measuring AI Automation of Remote Work (arxiv.org)
2 points by Leynos 1 day ago | past | discuss
Generalized on-policy distillation with reward extrapolation (arxiv.org)
3 points by fzliu 1 day ago | past | discuss
OpenAI model proposes and proves Physics result (arxiv.org)
1 point by KothuRoti 1 day ago | past | discuss
An API for Biological Neural Networks (arxiv.org)
1 point by bwjx 1 day ago | past | discuss
Adversarial Patch: images that make classifiers ignore other items in a scene (arxiv.org)
1 point by felineflock 1 day ago | past | discuss
Maximum Agreement Linear Predictor (MALP) (arxiv.org)
1 point by tesserato 1 day ago | past | 1 comment
Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators (arxiv.org)
1 point by PaulHoule 1 day ago | past | discuss
Fine-Tuning GPT-5 for GPU Kernel Generation (arxiv.org)
4 points by matt_d 1 day ago | past | discuss
SWE-ContextBench: context learning benchmark in coding (arxiv.org)
1 point by mustaphah 1 day ago | past | discuss
LLMs exceed physicians on complex text-based differential diagnosis (arxiv.org)
3 points by rippeltippel 1 day ago | past | 2 comments
Horus: A Protocol For Trustless Verification Under Uncertainty (arxiv.org)
1 point by optimalsolver 1 day ago | past | discuss
Learning to Reason in 13 Parameters (arxiv.org)
2 points by stared 1 day ago | past | discuss
LLM Reasoning Failures (arxiv.org)
1 point by gradus_ad 1 day ago | past | discuss
Defining causal mechanism in dual process theory and 2 types of feedback control (arxiv.org)
1 point by s6i 1 day ago | past | discuss
Routing LLM queries using internal success predictions (70% cost reduction) (arxiv.org)
1 point by stansApprentice 2 days ago | past | 2 comments
SWE-AGI: benchmarking spec-driven software construction (arxiv.org)
1 point by mustaphah 2 days ago | past | 1 comment
Authenticated Workflows: A Systems Approach to Deterministic Agentic Controls (arxiv.org)
3 points by mrajagopalan 2 days ago | past | 1 comment
Formalization and Inevitability of the Pareto Principle (arxiv.org)
3 points by bikenaga 2 days ago | past | 1 comment
RL on GPT-5 to write better kernels (arxiv.org)
4 points by atallahw 2 days ago | past | 1 comment
Quantum observers can communicate across multiverse branches (arxiv.org)
2 points by lisper 2 days ago | past | discuss
Pushing Tensor Accelerators Beyond MatMul in a User-Schedulable Language (arxiv.org)
1 point by matt_d 2 days ago | past | discuss
HySparse: A Hybrid Sparse Attention Architecture (arxiv.org)
5 points by readitalready 2 days ago | past | discuss
Biases in the Blind Spot: Detecting What LLMs Fail to Mention (arxiv.org)
1 point by jari_mustonen 2 days ago | past | discuss
Evaluation of RAG Architectures for Policy Document Question Answering (arxiv.org)
1 point by PaulHoule 2 days ago | past | discuss
SoftMatcha 2: A Fast and Soft Pattern Matcher for Trillion-Scale Corpora (arxiv.org)
3 points by salkahfi 2 days ago | past | discuss
Opus: Towards Efficient and Principled Data Selection in LLM Pre-Training (arxiv.org)
2 points by onurkanbkrc 2 days ago | past | discuss
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters (arxiv.org)
1 point by onurkanbkrc 2 days ago | past | 1 comment

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: