Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
TaxCalcBench: Evaluating Frontier Models on the Tax Calculation Task (arxiv.org)
70 points by handfuloflight 8 days ago | past | 23 comments
A Survey of Vibe Coding with Large Language Models (arxiv.org)
1 point by Gigacore 8 days ago | past | discuss
Towards Logic: The Language of AI (arxiv.org)
3 points by cmogni1 9 days ago | past | discuss
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation (arxiv.org)
1 point by adidoit 9 days ago | past | 1 comment
Agentic Bug Reproduction for Effective Automated Program Repair at Google (arxiv.org)
1 point by chw9e 9 days ago | past | discuss
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? (2024) (arxiv.org)
1 point by fzliu 9 days ago | past | discuss
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation (arxiv.org)
1 point by randomwalker 9 days ago | past | discuss
[flagged] Gravity Can Explain the Collapse of the Wavefunction (Sabine Hossenfelder) (arxiv.org)
49 points by felineflock 9 days ago | past | 55 comments
Certifying almost all quantum states with few single-qubit measurements (arxiv.org)
1 point by rbanffy 9 days ago | past | discuss
A Less Terrifying Universe? Mundanity as an Explanation for the Fermi Paradox (arxiv.org)
5 points by cyberlimerence 9 days ago | past | 1 comment
Robot Learning: A Tutorial (arxiv.org)
2 points by Anon84 9 days ago | past | discuss
Evaluating Argon2 adoption and effectiveness in real-world software (arxiv.org)
32 points by pregnenolone 9 days ago | past | 32 comments
Tensor Logic: The Language of AI (arxiv.org)
3 points by max_ 9 days ago | past | discuss
Subspace-Accelerated Coordinate Descent for Physics-Based Simulation (arxiv.org)
1 point by E-Reverance 9 days ago | past | discuss
Old Is Gold: Optimizing Single-Threaded Applications with Exgen-Malloc (arxiv.org)
16 points by todsacerdoti 9 days ago | past | 7 comments
Inferring User Actions from Screen Recordings to Recommend Better Workflows (arxiv.org)
2 points by azhenley 9 days ago | past | discuss
PEFT Evaluation for Safe Code Generation (arxiv.org)
1 point by grac3 10 days ago | past | discuss
Refrag: Rethinking RAG Based Decoding (arxiv.org)
2 points by bbzjk7 10 days ago | past | discuss
Dynamically relevant consciousness precludes artificial consciousness (2023) (arxiv.org)
2 points by measurablefunc 10 days ago | past | 1 comment
Reducing Pipeline Bubbles with Adaptive Parallelism on Heterogeneous Models (arxiv.org)
2 points by PaulHoule 10 days ago | past | 1 comment
Who Said Neural Networks Aren't Linear? (arxiv.org)
2 points by ComplexSystems 10 days ago | past | discuss
Are Foundation Models Ready for Industrial Defect Recognition? A Reality Check (arxiv.org)
5 points by PaulHoule 10 days ago | past | discuss
The Optimal Strategy for Playing Lucky 13 (arxiv.org)
1 point by belter 10 days ago | past | discuss
Gravity can explain the collapse of the wavefunction (arxiv.org)
18 points by dboreham 10 days ago | past | 14 comments
Agentic Context Engineering: Evolving Contexts for Self-Improving LLMs (arxiv.org)
2 points by mooreds 10 days ago | past | discuss
From Automation to Autonomy (arxiv.org)
2 points by jruohonen 10 days ago | past | 1 comment
Literate Tracing (arxiv.org)
3 points by todsacerdoti 10 days ago | past | discuss
AutoPR: Let's Automate Your Academic Promotion [pdf] (arxiv.org)
2 points by SerCe 11 days ago | past | 1 comment
StreamingVLM: Real-Time Understanding for Infinite Video Streams (arxiv.org)
33 points by badmonster 11 days ago | past | discuss
Mano: Multi-Modal Foundation Model and 3-Stage RL for SOTA GUI Automation (arxiv.org)
2 points by jinqueeny 11 days ago | past | discuss

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: