Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Celery 5.0 (celeryproject.org)
172 points by dragonsh on Oct 6, 2020 | hide | past | favorite | 66 comments


Celery was never reliable for me. I used Celery as a part of Django. It kept running into weird issues. Celery-Flower used to monitor the Celery threads is not maintained and that became a huge issue for me as I had no way of tracking any issue.

I have also used Celery parallelize crawling of multiple websites in parallel. Beyond a small number of workers, Celery would go into a frozen state (My data flow was one way. No chance that I know of deadlocking.) The code felt too convoluted to dive in and understand.

I have since shifted to Dramatiq ( https://dramatiq.io/ ). It has all the features I like of Celery (Chaining of async methods with error handling when any one link on the chain fails), with super reliability. It also has Django support.

Having said that, it must be recognized that Dramatiq was built from the success and learnings gained from Celery. Celery is a wonderful piece of work still in active use in almost all Python shops. I am just thankful that we now have so many alternatives.

PS: The Django-Q mentioned in another comment looks very well structured, comprehensive and actively developed. I shall give it a try. For now, so happy with Dramatiq.


I love everything about Dramatiq (and am a contributor to the project, and use it in a few prod apps) except for the decision to use actor nomenclature. It makes sense at first glance, but in reality dramatiq is not an actor system. Actors do not live and die and get managed by supervision as they would in an environment like Erlang. Instead, messages just go onto a queue and get popped off by a worker.

I guess as long as you go into it eyes wide open - dramatiq is not an actor framework - you will be a-ok.


Celery Flower is absolutely maintained! Last commit and last release were a few months ago:

* https://github.com/mher/flower

* https://pypi.org/project/flower/#history


I faced some issues a year or two ago. It wasn't then. It was one of the pain points many users faced. One of the issues I faced was scheduled tasks wouldn't show up in Flower. There was no way for me to know if a task was run without monitoring the database. I didn't follow after that as I moved to Dramatiq.

I am so glad to hear that the community has listened and picked it up again. Thank you for pointing it out.


I am going to take a look at Django-q

I've used celery very successfully in the past once setup. About ten years ago I had problems connecting to rabbitmq. I finally found a solution and posted it on the GitHub project page. The funny thing is two years later I ran into the same issue. I forgot about that post and ended up stumbling on my post, and at first didn't realize it was mine.

I still get an occasional email about that solution.


Amazing news and congrats to the devs! We rely on Celery to handle about 700k tasks daily and it has not failed us once.

However, if anyone from the maintainers is reading - could you guys please have auvipy not deliberately close outstanding issues / PRs that he thinks are not important and that other maintainers then proceed to re-open.

Disclosure: I got banned for pointing that out only, so I may be slightly biased. [0]

0 - https://github.com/celery/celery/issues/4817#issuecomment-47...


Yeah, this bit me in the ass too. They close bugs that are valid but they just don't want to fix, so looking at their bug tracker gives you a misrepresentation of their project. It's a poor practice, and I've been hit by bugs in production that were closed in their bug tracker.

Also, you really need to read the fine print to understand the quirks of every backend (eg. Redis) when using Celery. For me, this plus the above means I'll never use Celery again.


We've learned through painful experience to treat Celery as little more than a vanilla pub/sub system. Some of its more "advanced" features sound nice, until you discover they're implemented in surprising or unscalable ways. They've gotten better at documenting some of these gotchas, but at this point we've been burned one too many times. I'm hopeful 2021 is the year we eliminate Celery from our system.


If you add up every hour I spent debugging celery in a production system, I could probably just rewrite it with the few features I was actually using.

It's a terrible piece of open source as it "looks" fine from the outside, you integrate it and then you start getting weird stability issues that are super hard to debug.


> Starting from now users should expect more frequent releases of major versions as we move fast and break things to bring you even better experience.

I was in the process of evaluating Celery for a project, and it was looking promising, but this sentence alone might have prompted me to pull the emergency brake.

I've had past experience with projects that have a policy of shipping frequent major releases in order to have frequent breaking changes, and it was never a fun time. It's not just that the breaking changes themselves are troublesome. It's also that projects that are overly liberal about removing features tend to become overly liberal about adding features, too. So that, over the long run, they have a tendency to become bloated and clunky at a faster-than-average rate.


Yeah, as a long time Celery user, I'd say you're making the right call to reevaluate your decision. Upgrading Celery/Kombu/Billiard has been enough of a hassle in the past with breaking changes, regressions, undocumented "features", and things getting dropped without any sort of deprecation timeline. I'm not looking forward to that becoming "more frequent" (and will likely be exploring other alternatives)


Just piling on here to agree with this sentiment. I've spent weeks of my life debugging celery/kombu/billiard-related issues, and I found that celery's behavior in many cases was too flakey to properly isolate a specific root cause. If I was starting a new project from scratch, would look into an alternative like Dramatiq.


I'm interested on people's thinking in this vein. If the current version meets your needs and you don't need to update to a newer major version, what's the problem if they keep releasing new major versions with breaking changes?


Not getting patches since they don’t support previous versions for very long?

”As we’d like to provide some time for you to transition, we’re designating Celery 4.x an LTS release. Celery 4.x will be supported until the 1st of August, 2021.” [from the op]

An LTS version supported for less then a year from being designated as LTS?


Say you pin your project at version 5. An exploit is found. There is no 5.1. Instead they fixed the exploit in 7.

However 7 breaks your pipeline.

What do you do? Fix the vulnerability yourself on an outdated version (high effort) Upgrade your entire pipeline to the new version (High effort) Leave the vulnerability in place (terrible idea)


You left out "switch to a better suited dependency (high effort)"

That's honestly the only viable choice with such libraries if you want to deploy your software in production environments.

Thankfully, there are alternatives around and you're not forced to migrate every task at the same time


To add to what others have said: Staying on an old version (and only upgrading when your current version reaches end of life) may be a viable strategy if you accidentally find yourself dependent on a project with policies like this, and don't like the situation.

But that's a mitigation strategy, not a place you actually want to be. I would not consider selecting a dependency with the intent of doing things that way to be a sound policy, because infrequent big-bang updates like that have a tendency to be expensive and risky. There aren't a lot of development ethics that I find less palatable than "move fast and break things," but "move slow and break things" is definitely one of them.


well one problem is that Celery still has bugs and issues, and each new release fixes some bug of the month (it sometimes introduces new ones unfortunately).

There was a sequence of releases in 4.x that each had their own game-breaking bug for us, which has meant we have been stuck for a while (I've tried personally contributing fixes to this project, and fixed one bug, but another got introduced in the same release).

I would be happy about more granular releases, but I hope they could make more "bugfix-only" releases so I can make some forward progress on not having a busted task queue.


To bransonf's point, it's generally in your best interest to stay as current as possible on your dependencies. If a critical severity vulnerability is discovered in a dependency, you suddenly have 72 hours (or whatever your SLA is) to patch or upgrade. If that involves jumping several major versions (each with breaking changes), you're going to have a bad time. If you stay current, hopefully the delta is smaller and the task is less herculean.


Many projects have long term support (LTS) releases which will receive security updates for 2-5 years without gaining new features. Security updates are generally not backported to other older versions, though it depends on the project/community.


This is great release, it fixes the long standing bug [1] of memory leak in celery beat for periodic tasks.

This combined with flower [2] provides a simple platform to build powerful platform for data integration with real-time and scheduled long running jobs. Apache Airflow also use celery for distributed execution of tasks and jobs.

[1] https://github.com/celery/celery/issues/4843 (fixed in 5.0 will be backported to 4.x).

[2] https://flower.readthedocs.io/en/latest/


except that flower doesn't work with celery 5 at the moment :-(


This trend of wanting to move fast and break old code is exactly the opposite of what I want from my dependencies.

Look the story of AngularJS vs React.

As a library you’re the less important part of my system, I use you to save me time so that I can focus on business logic. The moment I am spending time on you constantly is the moment I am looking for something else


For anyone who's been looking for a simpler Task Queue for Django, check out Django-Q [1], it's been working very nicely for me. It doesn't have all the bells and whistles of Celery but it's very simple to get up and running using the DB as a broker and comes with periodic tasks out of the box.

[1] https://django-q.readthedocs.io/en/latest/


Shoutout to django-background-tasks[1] which also uses your existing database to manage the task queue. It worked really well for me.

[1] https://github.com/arteria/django-background-tasks


I also checked out this one! Can't run it on multiple servers unfortunately which disqualifies it for my use (https://github.com/arteria/django-background-tasks/issues/10...)


You might also like huey (1).

[1] https://github.com/coleifer/huey


I also evaluated Huey when I was looking around and it seems nice. IIRC the main reasons I chose Django-Q were 1) Built in DB as a broker (Huey is mainly focused on Redis I think) 2) Django-Q can run multiple clusters (I run in an AWS auto scaling group) and not duplicate scheduled tasks and 3) it will run on Windows (I run an OS project that is cross platform).


May I plug my own project? As a long time Celery user, here is what Celery would be if it were written by me:

https://github.com/NicolasLM/spinach


Looks nice! Especially this bit "All Spinach workers are part of the system that schedules periodic jobs, there is no need to have a pet in the cattle farm.". The fact that you can only have one celery-beat running is one of the main reasons I chose something other than Celery.


So nice of the devs to start their release notes with one-paragraph explanation of what the product is.

A link to more detailed description is missing though.


Just below that explanation:

"To read more about Celery you should go read the introduction" (with link)


"What’s a Task Queue?" is not exactly what I need to understand what problems the product solves and whether I might use it in my project.


The main problem with Celery is its overuse in the python world. It's not a bad project but sees a lot of negativity because it's used in places where it shouldn't! There are a lot of tutorials suggesting that using "async tasks" are required if you are interfacing with an external service (i.e like sending email). Most of these people suggest to actually use Celery for that.

Well, the fact is that if you are sending a couple of emails per hour for users that are registering to your site you don't need to use async tasks. Just offload them to your SMTP or use something like sendgrid. It won't matter to the user if there are a couple of seconds until he sees the http response or he gets the email after 1 minute. Also, even for other kinds of async tasks you can usually get away using a management command and a cron job running once per minute. Let's suppose your users may need to create a report that needs 1 minute to be created. Just flag it to run in your request response cycle and run it in the management task from cron. The user will be notified when it's been finished. Or even if you actually need that async task functionality, just start by using a simpler way to run async tasks (that aren't as complex as celery, don't need rabbit mq and can even use the database as a broker so you won't need any external parts). You almost probably don't need to support hundreds of async tasks per minute or complex task workflows using forks joins etc. I mean if you need to use celery you probably will know it.


At first I thought you were going to say overuse or celery in the python world is a signal of how bad concurrency is in core python.


So, uhm. What _is_ new? There's a bunch of removals and updated pre-reqs, but somehow the page doesn't tell what's actually new in Celery 5.0?


If you read through it does contain all the necessary details what's new in 5.x series compared to earlier major Celery release and pointing to additional documents link for specific details. If you are interested in history and updates they are in:

https://docs.celeryproject.org/en/latest/changelog.html#chan...

https://docs.celeryproject.org/en/latest/history/index.html#...


I’m with the parent here, neither of these links explains what this means for me. I don’t understand what advantages this offers, or how it improves current or future workflows. Further, it doesn’t explain the deprecations well so I don’t have a clear sense of why I want to upgrade. What should I do if I was using the AMQP backend? What advantage does click offer over the previous command line parsing? The one thing I saw was in another comment someone indicated it fixes a memory leak.


> There's a bunch of removals and updated pre-reqs

My understanding is that the Python versions that don't support asyncio have been removed, which should make the codebase cleaner and easier to maintain. So it's actually a pretty big deal.


It's a bit sad that there's still no windows support, it would obviously impact performance and there are a few things to keep in mind in regards of worker health checks but I don't think it's something that justifies not supporting it (not trying to imply that they owe anyone anything though).

A few years ago I added native windows support to python rq but the maintainer couldn't accept the pull request which was a bummer so I just abandoned the project.


We use celery heavily, but now our system started growing I wished we used a message bus pub/sub system that was language agnostic. I guess its a different beast, but a lot of the same things can be solved.

And if you have first-class messages instead of function calls as the data in your queue, things like decoupling into services becomes much easier.

In celery the consumer and producer are very tightly coupled (same code needed).


I met this when I had to combine a python project with a Laravel project(which also has its own task manager if I dared to use it). I wanted to use Celery but I could not find a way to integrate it with other parts of my system. Would be nice if Celery had some type of pluggable system that could be built upon so that disparate systems could communicate (you know the value proposition of AMQP).


> Starting from now users should expect more frequent releases of major versions as we move fast and break things to bring you even better experience.

How can frequent breaking changes be a better experience?


If the things it's breaking are old cruft that only lead to a worse experience.


Like, if you don't change oil in your car frequently enough we'll make the whole car explode so you never get to suffer from a bad driving experience?


My main problem with Celery is their cowboy handling of distributed computing combined with poor documentation. I'm not aware of any documentation making it clear what's their choices for not-exactly-once result, for one. Failure modes aren't obvious either. Do we need to make sure that tasks are idempotent? Is there any chance they'll be retried if a worker dies suddenly? What exactly happens when a worker receives SIGKILL? How all of the above works with their Fabric (task orchestration) stuff?

I just can't trust a product that doesn't even discuss those issues prominently in the documentation. Distributed computing is inherent for a background task queue, and it's one of the hardest problems out there, so their best effort patchwork of retries and checks doesn't cut it for me. They seem to code and document for a happy case, which is a huge red flag.


I'm working on the infrastructure of a startup (Whova). Our backend is fully in Django, so our background tasks are run with celery since that's the main tool for that in the python community and a lot of legacy code is built around that. We process millions of tasks per day, for various things: cpu-bound logic, io (email/push notifications, 3rd part api calls, ...), scheduled tasks, ... These tasks are split across 20 queues processed by multiple workers on 6 dedicated machines.

My celery experience so far has been quite awful to say the least.

From the infrastructure point of view, celery has been the less reliable component of our stack by far. As others have mentioned, one problem is that celery is frequently used for what it is not meant to be. And that's true for us too. If you check the documentation, deep down, celery was originally designed for short lived tasks, cpu bound, but turned out to be used for long lived tasks. And starting to process long lived tasks is the root of many problems until you find the correct settings to make it work. Of course, these settings are either not documented, or the documentation is useless at best (it sometimes creates even more confusion). There are also very few good quality resources online regarding this.

For several months, we dealt with celery workers getting stuck and not processing tasks, celery workers running out of memory, ... until we found the correct solution. And even now, we still have some random issues we have difficulty to track down due to the poor quality of monitoring around celery. It actually made me smile a few weeks ago when the engineering team of DoorDash released a blog article about celery in which they mentioned several issues we encountered, including some they still have no clue but managed to mitigate (in particular, the stuck celery queue: they need to use -Ofair to fix the scheduling algorithm!) [1]

It's also very easy for developers to make mistake with celery: celery routing in Django is messy (routing of individual tasks and scheduled tasks), adding new queues need some coordination upon deployment until you automate it, generating too many scheduled tasks can make your workers run out memory, ... Celery definitely requires a solid training for all the engineers that will work with it. To be fair, this is very likely to be a true for any backgroubd processing tools: it usually is a critical part of the tech stack, but resources/training about that are less.

We are still using Celery 3. We few months ago, when they released celery 4, we looked into upgrading, but it was way more work that expected as the entire configuration syntax was broken. The testing needed to deploy that to production was not worth the shot, especially when factoring it took us months to find some tricky settings to get celery to finally be somewhat stable, so why risk losing that. Now, they already are at celery 5.0, and they plan to release even more breaking updates: seriously, WTF! And if you try to report issues but you use celery 3, you'll just be told to upgrade.

To be frank, I believe celery is a good project. They aren't many alternatives in python anyway. But they don't seem to listen to what their users need. It really seems that there is a gap between what they expect people to do with celery and what people do with it. I understand it's hard to provide a good default configuration suiting everyone, but then provide the appropriate documentation about how you can tune celery based on your use case, or clearly state the intented use case and limitations. So, the last thing we need is more breaking versions with more uncertainty about celery, but more documentation!

If they really go on that path, it's clear that we will eventually ditch celery for something else. Celery, from our experience, is not production friendly unless you put major efforts into it, or unless your project is fairly simple.

[1] https://doordash.engineering/2020/09/03/eliminating-task-pro...


I can definitely confirm the issues. We had memory issues, stuck tasks and lost tasks (despite late ACK). It really requires a lot of fine tuning to get right. We started switching to AWS SQS four years ago and created a thin Django/Python wrapper [0] for it. Now we handle tasks in the hundred of millions a day and I don't want to think about how that would have turned out if we didn't switch.

But honestly, people working on Celery in their free time (i think) so who am I to complain. I can see that monetary support would be necessary to make it better. But in my opinion the project just got too big and it will be difficult to fix all the underlying issues.

[0]: https://github.com/cuda-networks/django-eb-sqs


Is there any async/promise support yet? Can I get a promise for the result of a task and await it, rather than block or poll?


Kind of via a callback or a chord. But as others have said, these special paths get dicey depending on your broker.


If there is 1 wish list I have for Celery:

Please take care of the Redis connections. Make sure that you can control all of the client settings and make sure that it reconnects reliably.


slightly related, how does Celery implements delayed and period tasks internally? Does it use its own (local) data store to keep track of tasks?


For delayed tasks, it sends those tasks to the underlying broker, and one of the workers then consumes then without acknowledging the message. That message gets held in the workers' memory until it should run.


celery is too complex. Use redis for simpler queue.


Is this not a case where the goal of the project is incompatible with a language used to implement the project?

From what I can gleam from a high level scan of the project, Celery is a process/task queue, yet the intermingling of Celery's framework with whatever a process/task's goal happens to be looks like a non-starter for any organization that does not use python like BASH.

This appears to be a poor architecture for a distributed programming framework. There are a good number of high quality proses/task queues already, I don't see a reason for this project at all.


Don’t use Celery.

It doesn’t solve a single problem that a set of bare queue consumers can’t resolve.


Like having a python programming interface to seamlessly defer tasks to arbitrary local/remote workers?


I think it's closer to "Celery is a great tool to get started, but as your product matures you'll end up falling back to a simple AMQP consumer stack":

- By default, task IDs are just pickled Python objects - if you want to change the location of a function, it might break; - The whole process appears to be reloaded for each task - any heavy loading ends up being performed on each call; - I couldn't find any "ops" documentation: how does it interact with RabbitMQ? How are deferred tasks implemented? What happens if a node crashes?

Although the API is nice, the product itself seems ill-suited for actual reliable, production use — at that point, it's sometimes easier to just deploy your own minimal API using the serialization format you've chosen to adopt (JSON, Protobuf, ASN.1, … ;)


I’ve found celery to be extremely useful and reliable at handling > 1MM daily async tasks with non deterministic latencies. The programming interface melds beautifully with Flask.


For the record, back in the days, Instagram used Celery at large scale : http://lc0.github.io/blog/2013/05/01/celery-messaging-at-sca...


Anecdotal, but we use Celery in a large scale production environment without too many issues.


Celery has been pretty reliable when i have used it in the past. Admittedly I have newer done anything particularly fancy with it, just running some background tasks and sending out some emails.


So it didn't solve any problems that other simpler libraries would not solve, as I have said above.


You also said that it was unreliable. Not in my experience.


"Don't use a message queue/broker/scaled out system".

"It doesn't solve a single problem that a set of queue instances on a single machine solve. Who needs any resilience?"

Weird take, that is provably incorrect and wrong.

No thanks.


I stated that Celery as a brand is horrible - I am not saying you should not use a task queue product.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: