Hacker Newsnew | past | comments | ask | show | jobs | submit | enonimal's commentslogin

nominative determinism having a field day


I don't have an answer but I'd love to know this too. Why has Postgres got this unique staying power?


Coming at this from a small sample size, every time I've seen it used has been because some of the developers on a team love it and think it's cooler than the other options. And, over time, the operational experience has gotten better (AWS' Postgres support for RDS/Aurora is all recent, for example); and, in fairness, I'd take psql over SQL Server any day of the week.

Regarding why it has popularity beyond mySQL/mariaDB is still a confounding mystery as far as I'm concerned. The additional behaviors Postgres tends to encourage (I'm looking at you, publisher/subscriber and trigger functions) seems to lead to devs advocating it as 'easy' while those in my position are left to keep the damn thing running.


I developed my preference for PostgreSQL years ago, before MySQL supported foreign key constraints or defaulted to durable commits. MySQL also had this annoying tendency to silently store invalid timestamps as zero. All of these things have been fixed since (I hope?), but I still can’t shake my impression that PostgreSQL takes correctness more seriously.


I would say it's similar to Linux:

It's a free, solid foundational technology, guided by steady hands.

In a software economy full of profiteers, charlatans, and marketing babble, the project is providing real value to users.


> It's a free, solid foundational technology, guided by steady hands.

Beautifully said.


> Number of Posts with negative sentiment, grouped by Topic

> # 1 Result: Python Packaging

Checks out


The Python package is really well engineered, and the startup that is making the OpenAPI client based on it, Stainless, is doing a good job.

This shows laypeople piling into a hype thing and running immediately into the roadblock of programming.

Normal people don't want to like, put in effort to feel like they are a part of something.

They are used to "just" having to turn on Netflix to feel like they are a part of the biggest TV show, or "just" having to click a button to buy a Stanley Cup, or "just" having to click a button to buy Bitcoin. The API and performance issues, IMO, they're not noise, but they are meaningless. To me this also signals how badly Grok and Stability are doing it, they are doubling and tripling down on popular opinions that have a strong, objective meaninglessness to them (like how fast the tokens come out and how much porn you're allowed to make). Whereas the Grok people are looking at this analysis and feeling very validated right now.

I have no dog in this race, but I would hope that the OpenAI people do not waste any time on Python APIs for dumb people; instead, they should definitely improve their store and have a firmer opinion on how that would look. They almost certainly have a developing opinion on a programming paradigm for chatbots, but I feel like they are hamstrung by needed to quantize their models to meet demand, not decisions about the look and feel of Python APIs or the crappiness of the Python packaging ecosystem. Another POV is that the Apple development experience persists to be notoriously crappy, and yet they are the most valuable platform for most companies in the world right now; and also, JetBrains could not sustain an audience for the AppCode IDE, because everyone uses middlewares anyway; so I really don't think Python APIs matter as much as the community says they do. It's a Nice to Have, but it Does Not Matter.


we may think more similarly than you seem to think...

this was more a slam on python packaging in general, than it is on the OpenAI implementation.

I wouldn't be surprised if many of the issues under this topic are more related to Python package version nightmares, than OpenAI's Python implementation itself.


A pro-tip for using the OpenAI API is to not use the official Python package for interfacing with it. The REST API documentation is good, and just using it in your HTTP client of choice like requests is roughly the same LOC without unexpected issues, along with more control.


I've found this happens with a lot of first party clients. At work, we use LaunchDarkly for feature flags and use their code references tool to keep track of where flags are being referenced. The tool uses their first party Go client to interact with the API but the client doesn't handle rate limiting at all even though they have rate limiting headers clearly documented for their API.


First party clients are typically an afterthought, and you can't add features without getting a PM to sign off, which strangles the impulse to polish & sand down rough edges.


Agreed. Any in particular come to mind that you'd like to see improved?

(my company provides first-party clients with a lot of polish; maybe we could help)


Hey minimaxir, I help maintain the official OpenAI Python package. Mind sharing what issues you've had with it? (Have you used it since November, when the 1.0 was released?)

Keen for your feedback, either here or email: alex@stainlessapi.com


There's nothing wrong per se, it works as advertised. But as a result it's a somewhat redundant dependency.


Ah, gotcha. Thanks, that makes sense. FWIW, here are some things it provides which might be worth having:

1. typesafety (for those using pyright/mypy) and autocomplete/intellisense

2. auto-retry (w/ backoff, intelligently so w/ rate limits) and error handling

3. auto-pagination (can save a lot of code if you make list calls)

4. SSE parsing for streaming

5. (coming soon) richer streaming & function-calling helpers (can save / clean up a lot of code)

Not all of these matter to everybody (e.g., I imagine you're not moved by such benefits as "dot notation over dictionary access", which some devs might really like).

I would argue that auto-retry would benefit a pretty large percentage of users, though, especially since the 429 handling can paper over a lot of rate limits to the point that you never actually "feel" them. And spurious/temporary network connections or 500s also ~disappear.

For some simple use-cases, none of these would really matter, and I agree with you - especially if it's not production code and you don't use a type-aware editor.


FWIW, here are the only links I could find in the article which were tagged "Python3 Package": https://community.openai.com/t/647723 and https://community.openai.com/t/586484 . Note they don't see to have anything to do with the Python package whatsoever.

I was pretty disappointed to see this, as I work on the Python package and was hoping for a good place to find feedback (apart from the github issues, which I monitor pretty closely).

I'm not a data scientist; maybe someone from the Julep team could comment on the labeling? Or how I could find some more specific themes of problems with the Python package? (Was it just that people who have a problem of some kind just happen to also use the Python library?)


Hey! Happy to chat over email/X more closely and help you out.

Nomic Atlas automatically generates the labels here. There could be different variations of posts involving the Python Packages.

But I did some manual digging & here's what I found; Heading over to the map and filtering by posts around "Python Packages" leads to around 900 posts.

Sharing a few examples which do talk about people's posts related to the python package:

- https://community.openai.com/p/701058 - https://community.openai.com/p/652075 - https://community.openai.com/t/32442 - https://community.openai.com/p/143928

Note: My intuition is that most of the posts are very basic, probably user errors like "No API Key Found" etc.


gotcha, that makes sense - thank you!


how do you perceive privacy in such a world?


> "Who says plasticity is good"

In Machine Learning, we might think of this idea as "setting your Learning Rate too high"


In ML, this will bump you into a nearby optima at random. That optima may or may not be better than your prior. And if it’s worse, there’s a good chance it’ll be very difficult to get back to where you were without external checkpoints to revert to.

The brain is the same, but no checkpoints.


I wouldn't say the brain has no checkpoints. Certain events (like concussions or stroke) can knock out memories or habits and result in roughly what was some months or years ago. Also, certain things like Dissociative Identity Disorder can result in past/younger versions of yourself splitting off and halting development for years or more. Then they can come back up later.

With that said, I don't think anyone really has control over this. I'm a DID system and I have some sort of limited control but my time skips are usually on the order of days or weeks, not years.


Thank you -- I had always wondered what the optimization was behind that Algo!


law:

prompt engineering before it was cool


maybe a fun karpathy video here...


This is a cool idea -- is this an inner-loop process (i.e. after each LLM evaluation, the output is considered to choose the next sample) or a pre-loop process (get a subset of samples before tests are run)?


It seems that you're the only one who understood the idea. I don't know current LLMs use such a method or not, but the idea could be 10 times faster


AFAICT, this is a more advanced way of using Embeddings (which can encode for the vibes similarity (not an official term) of prompts) to determine where you get the most "bang for your buck" in terms of testing.

For instance, if there are three conversations that you can use to test if your AI is working correctly:

(1) HUMAN: "Please say hello"

    AI: "Hello!"
(2) HUMAN: "Please say goodbye"

    AI: "Goodbye!"

(3) HUMAN: "What is 2 + 2?"

    AI: "4!"


Let's say you can only pick two conversations to evaluate how good your AI is. Would you pick 1 & 2? Probably not. You'd pick 1 & 3, or 2 & 3.

Because Embeddings allow us to determine how similar in vibes things are, we have a tool with which we can automatically search over our dataset for things that have very different vibes, meaning that each evaluation run is more likely to return new information about how well the model is doing.

My question to the OP was mostly about whether or not this "vibe differentiated dataset" was constructed prior to the evaluation run, or populated gradually, based on each individual test case result.

so anyway it's just vibes man


That's probably the intent, but I don't know if this actually achieves this (I have another comment that's about the use of bayesopt here). But even if it did, bayesopt operates sequentially (it's a Sequential Model-based Optimizer or SMBO) and so the trajectory of queries different LLMs evaluate would be different. Unless there is something to correct this cascading bias I don't know if you could use this to compare LLMs. Or obtain a score that's comparable to standard reported numbers.

On a different note, if all we want is a diverse set of representative samples (based on embeddings), there are algorithms like DivRank that do that quite well.


biodigital jazz man


This would be an inner loop process. However, the selection is way faster than LLMs so it shouldn't be noticable (hopefully).


> Take courage and set out to write up the Great Discovery; if after many hours of red- hot thinking and writing you discover to your dismay a fatal flaw . . . all is not lost. Go back to the first paragraph and write something along the lines of “It is tempting to think that . . . ”

XD


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: