The snippets are not false, but there's so much context missing it's easy to worsen the situation, especially for beginners which seem to be the target audience.
First, this guide should emphasize the need to measure before doing anything : django silk, django debug toolbarsm, etc.
Of course, measure after the optimizations too, and measure in production with an apm.
Second, some only work sometimes : select_related / prefetch_related / iterator will lead to giga SQL queries with nested joins all over the place, and ends by exploding ram usage. It will help at first, but soon enough one will pay any missing sql knowledge or naive relationships.
Third, caching without taking the context into account will probably lead to data corruption one way or another. Debugging stale cache issues is not fun, since you cannot reproduce them easily.
Fourth, celery is a whole new world, which requires workers, retry and idempotent logic, etc.
Finally, scaling is also about code: architecture, good practices, basic algorithm, etc
Probably 80% of notable performance problems I’ve seen in the kinds of systems that things like Django and Ruby get used for have been terrible queries or patterns of use for databases (I’ve seen 1,000x or worse costs for this versus something more-correct) and nearly all of the other 20% has been areas that plainly just needed some pretty straightforward caching.
The nice thing about that is that spotting those, and the basic approach to fixing them, if not the exact implementation details, are cross-platform skills that apply basically anywhere.
I actually can’t recall any other notable performance problems in those sorts of systems, over the years. Those are so common and the fixes so effective I guess the rest has just never rated attention. I’ve seen different problems in long-lived worker processes though (“make it streaming—everything becomes streaming when scale gets big enough” is the usual platform-agnostic magic bullet in those cases)
A bunch of TFA is basically about those things, so I’m not correcting it, more like nodding along.
Oh wait I just thought of another I’ve seen: serving large files through a scripting language, as in, reading it in and writing it back out with a scripting language. You run into trouble at even modest scale. There’s a magic response header for that, make Nginx or Apache or whatever serve it for you, it’s a fix that’s typically deleting a bunch of code and replacing it with one or two lines. Or else just use s3 and maybe signed URLs like the rest of the world. Problem solved.
Mmm. If you had the right library, might be able to stream it as it’s being created which might help at least with perceived performance, but yeah, that’s a fun one.
The basic outline of this post isn’t bad, the problem is that’s all there is - a basic outline. If you haven’t dealt with these problems before the checklists are meaningless. If you HAVE dealt with these problems before the checklists are redundant
This sort of article seems perfectly poised to be useless to beginners (no context, doesn't tell you how to use the things) and experts (no nuance, just listing basic features) alike. Who is it for? Why does it exist? Why is it posted here?
The snippets are not false, but there's so much context missing it's easy to worsen the situation, especially for beginners which seem to be the target audience.
First, this guide should emphasize the need to measure before doing anything : django silk, django debug toolbarsm, etc. Of course, measure after the optimizations too, and measure in production with an apm.
Second, some only work sometimes : select_related / prefetch_related / iterator will lead to giga SQL queries with nested joins all over the place, and ends by exploding ram usage. It will help at first, but soon enough one will pay any missing sql knowledge or naive relationships.
Third, caching without taking the context into account will probably lead to data corruption one way or another. Debugging stale cache issues is not fun, since you cannot reproduce them easily.
Fourth, celery is a whole new world, which requires workers, retry and idempotent logic, etc.
Finally, scaling is also about code: architecture, good practices, basic algorithm, etc
I'll end by linking to more complete resources : - https://docs.djangoproject.com/en/5.1/topics/performance/ - https://loadforge.com/guides/the-ultimate-guide-to-django-pe... - https://medium.com/django-unleashed/django-application-perfo...
Probably 80% of notable performance problems I’ve seen in the kinds of systems that things like Django and Ruby get used for have been terrible queries or patterns of use for databases (I’ve seen 1,000x or worse costs for this versus something more-correct) and nearly all of the other 20% has been areas that plainly just needed some pretty straightforward caching.
The nice thing about that is that spotting those, and the basic approach to fixing them, if not the exact implementation details, are cross-platform skills that apply basically anywhere.
I actually can’t recall any other notable performance problems in those sorts of systems, over the years. Those are so common and the fixes so effective I guess the rest has just never rated attention. I’ve seen different problems in long-lived worker processes though (“make it streaming—everything becomes streaming when scale gets big enough” is the usual platform-agnostic magic bullet in those cases)
A bunch of TFA is basically about those things, so I’m not correcting it, more like nodding along.
Oh wait I just thought of another I’ve seen: serving large files through a scripting language, as in, reading it in and writing it back out with a scripting language. You run into trouble at even modest scale. There’s a magic response header for that, make Nginx or Apache or whatever serve it for you, it’s a fix that’s typically deleting a bunch of code and replacing it with one or two lines. Or else just use s3 and maybe signed URLs like the rest of the world. Problem solved.
I have had to combine files into a zipped file on demand before. It is hard to avoid the inherent slowness of that.
Mmm. If you had the right library, might be able to stream it as it’s being created which might help at least with perceived performance, but yeah, that’s a fun one.
Interesting, was there a business reason to not do that in the background somewhere?
Yeah, very non-technical users that won't check their email or click on a notification when the zip file is ready for them.
The magic header is probably X-Accel-Redirect
Yeah, or the kinda-better-named “x-sendfile” on apache2. Same effect.
At first I wanted to criticize the post, buuut after finishing reading it I actually liked it. Very concise and practical
ps - I didn’t know about template “cache” directive
The basic outline of this post isn’t bad, the problem is that’s all there is - a basic outline. If you haven’t dealt with these problems before the checklists are meaningless. If you HAVE dealt with these problems before the checklists are redundant
Don't store secrets in settings.py. Typically you'd inject those from secrets management as environment variables.
An amazing movie.
This sort of article seems perfectly poised to be useless to beginners (no context, doesn't tell you how to use the things) and experts (no nuance, just listing basic features) alike. Who is it for? Why does it exist? Why is it posted here?