This video format is killing me. Would love to keep up with the announcement but...

layer8 · 2024-12-22T23:18:39 1734909519

AI video summary at your service:

In this video, Sam Altman and team introduce OpenAI o3 and o3-mini, showcasing their advanced reasoning capabilities and performance benchmarks compared to earlier models. The session highlights the models' robust accuracy in coding and mathematical tasks, as well as new safety testing initiatives involving community participation. The o3-mini is noted for its cost-efficient performance, while safety strategies are enhanced through deliberative alignment.

## Key Points

### Introduction to o3 and o3-mini models

During the final day of their 12-day event, OpenAI announces two new models: o3 and o3-mini. These models are positioned as advancements in AI reasoning capabilities, following the success of the earlier o1 model.

### Performance benchmarks

The o3 model achieves significant improvements in capabilities, scoring 71.7% on software benchmarks and reaching near-expert levels on competitive math exams. In comparison to o1, it shows over 20% improvement in coding tasks.

### Safety testing initiatives

OpenAI emphasizes safety testing for the new models, opening access for researchers to facilitate public testing. The goal is to ensure models are safe for general use while their capabilities are further validated.

### Introduction of o3-mini

The o3-mini is presented as a cost-efficient alternative to o3, also equipped with strong reasoning powers, allowing adjustable thinking time to optimize performance based on user needs.

### New safety techniques: deliberative alignment

A new safety training technique, deliberative alignment, uses reasoning capabilities of the models to create better safety benchmarks, enhancing the ability to accurately reject unsafe prompts.

### Future developments and availability

The video concludes with information on how to apply for early access to test these models with expectations of full public availability in early 2025, alongside a call for safety researchers to contribute.

uncomplexity_ · 2024-12-22T23:57:53 1734911873

the mini models are the most exciting to me. they seem to hit the balance of general utility and cost.

people are fond of comparing large bleeding edge models, but when we compare the small ones to say gpt3.5, it is still astonishing.

to me the smaller models being the balance of intelligence and cost is the best indicator of what the general population can use and afford.

the larger models end up being used mostly by individuals teams and orgs who can afford to pay for it and learn how to use it.

keep in mind at this day and age there is still a lot of people who dont know these things exist. it's just outside of their reality. then there are who have the slightest idea about it, but won't commit the time and money needed to learn it and fully experience it.

this is like the new generational transfer of wealth and information. even if you're an individual or a small team, with enough levers you can use these technologies to your advantage, and with more levers (like yc) you can enter the battlefield and compete with existing companies.

in-pursuit · 2024-12-22T23:37:25 1734910645

Can this “expert in math” do binary addition? o1 falls apart after 10 bits, which is easily memorizable.

fny · 2024-12-22T23:47:40 1734911260

Can this "expert in math" write a function that performs binary addition?

in-pursuit · 2024-12-23T01:03:41 1734915821

The issue isn’t performing the specific addition. Rather, you’re asking o1 to take n-bits of data and combine them according to some set of rules. Isn’t that what these models are supposed to excel at, following instructions? Binary addition is interesting because the memorization space grows at 2^n, which is impossible to memorize for moderate values of n.

uncomplexity_ · 2024-12-23T00:03:06 1734912186

this is the way.

its internal mechanism is still statistical prediction of text in units of tokens. that math seemed lucid enough to be able to use functions.

fny · 2024-12-23T00:11:31 1734912691

It’s like everyone suddenly forgot you can’t do O(n^2) compute in O(n) time.

in-pursuit · 2024-12-23T01:04:04 1734915844

Binary addition is O(n)

fny · 2024-12-24T22:26:07 1735079167

I meant this in the general case, not specifically binary addition. Also, returning an token by ChatGPT is technically an O(1) operation, so the same principle applies. Returning a computation answer of O(n_required_tokens) cannot be delivered in O(1) time without some sort of caching.

karimdaghari · 2024-12-22T23:33:46 1734910426

It's a bit ironic that it got the availability date wrong no?

layer8 · 2024-12-22T23:39:24 1734910764

It’s typical I’d say. I corrected it now.

(It originally said “early 2024” instead of “early 2025”. The YouTube transcript says “around the end of January”.)

muratsu · 2024-12-22T23:54:05 1734911645

I strongly agree. I stopped checking their page for news after day 2 due to video only announcements.