Pulumi is really a royal piece of shit. Why the f*ck am I writing code to do "deployment". In C# --> new Dictionary<string, object> when dealing with a values.yaml for instance. The whole need to figure out when and when not to use Apply.
As SRE dealing with former Pulumi, "Hey Devs can use code to deploy infrastructure" is not great idea you think it is. I've seen some real ugly conditional behavior where I'm like "Is this or is this not going to run? I honestly can't tell."
We had so much conflict with the ops team over their choice of Terraform. The three colors of variable thing is just fucking bonkers. Getting tests wrapped around it that actually did what we thought they meant was a giant pain in the ass.
I won't go as far as to say we burned bridges arguing back and forth about it but they were definitely significantly singed.
Config files simply don't work until they do. And if it's your job to stare at them for hours and hours a day then maybe that's okay with you, but if you expect other people to 'just learn' it you're an idiot or an asshole. Or both. Ain't nobody got time for magic incantations.
I also think it should tell you you're on the wrong path when your app is named after a verb and the data it deals with is all declarative.
Honestly, the culture/org structure is a way bigger problem in this story than any proper noun tool.
If you’re ignoring guidance and patterns and getting mad reinventing the wheel, that’s on dev. If “ops” mandates tooling and doesn’t have any skin in the game, that’s on them. And both problems are on your leadership.
If y’all just hate each other and don’t listen or participate, then you can’t be successful. It is ironic that this is the pattern that the devops movement landed us in.
They mean var vs local vs from-a-resource. There are some places you can’t use some types of variables. It can be annoying but it’s not really a huge problem if you design your approach with that in mind.
The worst part is that the Terraform team at Hashicorp often excuse not fixing these design issues as “safety measures” which isn’t entirely untrue but when over half of your users want something, sometimes you should get over yourself.
For what it’s worth, OpenTofu is fixing many of these sorts of things that cause people pain.
But my advice is to learn to use the tool. Terraform has such great benefits (in the right use cases). If you’re struggling, either you are missing something or you chose the wrong tool for your particular job. Either way, don’t gripe that this specialized tool for infra management doesn’t work exactly like every other general purpose programming language.
That’s only the case if you spend all day rerunning deployments. If your task is more frequently to transition the cluster config from A -> B then the distinction blurs and you go from a 10:1 delta ratio of the different classes of state to maybe 3:2, at which point it feels like splitting hairs.
Especially if the locals vary between prod and pre-prod, and worse if dev sandboxes end up with per-user instances, which for us was mercifully only needed for people working on the TF scripts, so we could run our tests locally.
We have multiple separate environments per application. For environment specific inputs we use variables.
The distinction is very clear in our team. Locals are used as const (like an application name), variables are for more dynamic user/environment inputs and data is to fetch dynamic information from other resources.
Zero problems. If a local becomes more environment specific a quick refactor fixes that. You can also have locals that use variable or data values if necessary.
One big win we also have is that we stopped using modules except for one big main module. We noticed from previous projects that as soon as we implemented modules everything became a big problem. Modules that are version pinned still required a lot of maintenance to upgrade. Modules that weren't version pinned caused more destruction than we planned. Modules outputs and inputs caused a lot of cycle problems,... Modules always seem too deep or too shallow.
Seconded, as someone that really does developer / operations, depending on the project assignment, I have learned the hard way that infrastructure configuration code should be as declarative as possible.
Sure "use code to deploy infrastructure" sounds great, and that is why we get stuff like Ant, Gradle, Pulumi, Jenkins Groovy scripts, .NET Aspire,.... until someone has to debug spaghetti code on a broken deployment.
On the flip side dsl declarative stuff is obfuscated magic that you can't step through or drive into.
a dsl like SQL involves one basic substrate (data organized in tables) that you can compile in your head. But declarative infra as code involves a thousand different things across a dozen different clouds.
Declarative will hold off spaghetti for... A bit. But it devolves to spaghetti as well (think fine grained acls, or places where order of operations, which the dsl does not specify and is magically resolved, becomes ambiguous).
And if you need to go off the reservation (dsl support doesn't exist or is immature for rapidly evolving platforms, need some custom postprocess steps) then you are... What?
Probably writing code and scripts to autoinvoke on the new node, phone home to a central.... Yup that's code.
Finally, declarative code has an implicit execution loop. But for something like iac that is a very complicated, the execution loop that isn't well documented. And some committed changes to declarative code May trigger a destructive pass followed by a possibly broken constructive phase.
I would agree with you, if HCL wasn't a bad language in itself:
* You can't make have variables in an import block (for example, to specify a different "id" value for each workspace)
* There is no explicit way to make a resource conditional based on variables. Only a hacky way to do that using "count = foo ? 1 : 0"
* You can't have variables in the backend configuration, making it impossible to store states in different places depending on the environment.
* You can't have variables in the "ignore_changes" field of a resource, making it impossible to dynamically ignore changes for a field (for example, based on module variables).
* The VSCode extension for HCL is slow and buggy. Using TS with pulumi or TFCDK makes it possible to use all the existing tooling of the language.
This massively depends on your provider code. Using loops to manage tf stuff can you you into really “fun” scenarios when you want to e.g delete an openstack firewall rule from the middle of the array.
I’ve been burned so many times here that I hate all of this stuff with an extreme passion.
Crossplane seems to be a genuinely better way out but there are big gotchas there also like resources that can simply never be deleted
As much as I like it, I find C# to be too inflexible of a language for infrastructure code. I tried with Pulumi for a while but moved to TypeScript as it works so much better. Structural typing makes your life a lot easier.
I bounce back and forth between javascript and C# depending on the nature of the job at hand. I'm curious what things you'd like to do with C# that you can't?
I find that with some handwringing, C# can be forced to do almost anything. between extension methods, dispatch proxies and reflection you can pummel it into basically any shape.
Having to write a little boilerplate to make it happen can be a drag though. I do sometimes wish C# had something from a blank project that let me operate with as much reckless abandon as Object.assign does in js land.
It's not the fault of the language, it's just the nature of infrastructure code that's been ported from terraform. With Pulumi C# you end up with multiple nested objects/dictionaries with a load of `new` object calls that just add noise to your codebase. There's also some pain points with some types being Input<T> which IDEs try to autocomplete when in reality you need to call `new T()`. Typescript permits structural typing that _feels_ a lot better to write and read within this context.
I use C# extensively for most other things I do, but this the one area where I prefer not to use it.
> Give me Terraform (as much as I hate it) any day
Terraform sure is a quirky little DSL ain’t it? It’s so weirdly verbose.
But at the same time I can create some azure function app, setup my GitHub build pipeline, get auth0 happy and in theory hook up parts of stripe all in one system. All those random diverse API’s plumbed together and somehow it manages to work.
I haven't used Terraform in years (because I changed jobs, not because of the tech itself), but back in the day v0.12 solved most of my gripes. I have always wished they'd implement a better "if" syntax for blocks, because the language itself pseudo-supports it: https://github.com/hashicorp/terraform/issues/21512
But yeah, at $previous_job, Terraform enabled some really fantastic cross-SaaS integrations. Stuff like standing up a whole stack on AWS and creating a statuspage.io page and configuring Pingdom all at once. Perfect for customers who wanted their own instance of an application in an isolated fashion.
We also built an auto-approver for Terraform plans based on fingerprinting "known-good" (safe to execute) plans, but that's a story for a different day.
I get around most of the if stuff using "for each" to iterate over a map. That map might be config (usually from the hiera data provider) or the output of another deployment. It's not generally a very flexible "if" that you need most of the time, it's more like "if this thing exists then create an X for it", or "while crafting X turn this doohickey on of that data set has this flap", which can be accomplished my munging together days with a locals var for loop (which support if statements).
Honestly, I only use terraform with hiera now, so I pretty much only write generic and reusable "wrapper" modules that accept a single block of data from Hiera via var.config. I can use this to wrap any 3rd party module, and even wrote a simple script to wrap any module by pointing at its git project.
That probably scares the shit out of folks who do the right thing, and use a bunch of vars with types and defaults. But it's so extremely flexible and it neutered all of the usual complexity and hassle I had writing terraform. I have single handedly deployed an entire infrastructure via terraform like this, from DNS domains up through networking, k8s clusters, helm charts and monitoring stack (and a heap of other AWS services like API Gateway, SQS, SES etc). The beauty of removing all of the data out to Hiera is that I can deploy new infra to a new region in about an 2 hours, or deploy a new environment to an existing region in about 10 minutes. All of that time is just waiting for AWS to spin things up. All I have to do in code is literally "cp -a eu-west-1/production eu-west-2/production" and then let all of the "stacks" under that directory tree deploy. Zero code changes, zero name clashes, one man band.
The hardest part is sticking rigidly to naming conventions and choosing good ones. That might seem hard because cloud resources can have different naming rules or uniqueness requirements. But when you build all of your names from a small collection of hiera vars like "%{product}-%{env}-%{region}-uploads", you end up with something truly reusable across any region, environment and product.
I'm pretty sure there's no chance I'd be able to do this with Pulumi.
Tip for naming, create a naming module where you pass in stuff like product, environment, region, service, have a bunch of locals for each thing like S3 bucket, RDS, EC2, EKS whatever you use then make them all outputs.
So at top of your IaC, you have module naming {variables as inputs} then all other resources are aws_s3 { name = module.naming.s3bucket }
Of course Pulumi can do for loops, you're using a proper programming language.
I meant that I doubt that I could 'cp -a' on a whole deployment tree, and deploy the copy successfully without having to make any code changes.
Although thinking about it, I take it back. It may be possible with Pulumi with the right code structure and naming conventions, and if configuration were separated entirely from the codebase, and if variables were inferred from the directory structure. That is really the thing that allows me do to it.
Yes, sorry for the rather pithy response, but separating out the "what changes" vs. "what doesn't" (config vs. code in your terms) is what makes these things possible.
As you also noted, doing this in plain terraform is kind of a pain, so using a tool like Hiera allows you to skip a lot of the work involved in doing it the "right" way. IMO if you're starting greenfield Pulumi (or CDK, anything that lets you use a "real" programming language) allows you to write (or consume!) that config in basically any form, instead of needing to funnel everything through a Terraform data provider.
Yeah. I guess maybe terraform makes sense if the people writing it spend enough of their time writing HCL to master it, but I ported our terraform config to Pulumi a few years ago and never looked back. It meant I could spend way less time googling for the HCL way to do something (say, templated resource) and just use the JS primitives I already know.
>spend enough of their time writing HCL to master it
Making Terraform changes every six weeks was enough time that we forgot everything and had to refresh our memories. Every time it felt like going into the water in a northern beach and forgetting how goddamned cold the water was, then reproaching yourself for forgetting.
Helm charts are a horrible example of text based templating.
You have YAML/JSON that k8s API wants, that is fed through helm which is fed through helmsman or whatever newer thing. There might be a layer or two of other templating around. Sometimes companies have built systems so developers/devops don't even have the ability to see what the final compiled version of the template is which is like the mother of all: "works on my laptop" problems.
It's super easy to break text based templating because of some space, tab, string escaping or whatever.
YAML makes it worse as there are lots of gotchas and different ways of doing. JSON, being quite verbose and inflexible at least has strong structure right in your face so it's a bit easier to figure out what went wrong.
With a proper programming language data structure you can be much better with verifying that the things you add or remove or iterate over will produce a valid result, much better refactoring and working as a team independently.
I once got a nil pointer exception when I updated a helm chart. I wondered why the hell am I getting a nil pointer exception for updating a YAML file. After some investigation I found an issue on GitHub where the maintainers said the Go team says this is an intended behavior for some case in Go templates.
That isn't a typical nill/null exception, like in JavaScript, ruby, and python. That's in a language where a lot of values are non-nullable, and some of the ones that are have zero-values that can be used without getting a nil pointer exception. https://go.dev/tour/moretypes/12
So, there's a good chance was an error that was really unexpected and it's better to show the error than to risk producing bad output.
I’m not sure why nobody invented a way to dynamically update values.yaml based on what are writing in the template file. And maybe vice-versa. It would be such a time saver. Maybe someone did, but I didn’t find it yet.
Tried Pulumi thinking "it's gonna abstract all the k8s specifics". Welp no, still need to know and understand K8s so I still don't see the value from those kind of tools. In which case why not use something like Pkl to generate my yaml from some sensible code-like structures?
kubernetes is very complex and therefore any abstraction which completely glosses over the way the underlying systems work would make it very hard to avoid leaking or a bad abstraction to begin with.
the complexity in one way or another must be preserved within the abstraction (in all likelihood) or you will have cases you cannot create in that layer or breakages which now have the total complexity of both the abstraction itself AND kubernetes itself required to fix.
i would not say IaC is going to provide you a magic solution to learning k8s, although the value in using IaC (e.g. Argo CD / Flux CD + Kustomize + ...) in K8s land is that you are no longer imperatively managing your cluster resources and therefore can keep them within a repository, managed like code. the point of the solution is not to make it easier for newcomers, but to make it easier to have teams manage and work together on an established cluster for deployments, ...
in the case of Pulumi, you leverage the single language with typechecking instead of relying upon K8s flavoured YAML, which is itself beneficial in many ways (since you can use your regular developer tooling)
wrt pkl, pretending K8s manifest structure underneath does not help because you will need to know how the keys within a manifest interact with the underlying system regardless, especially to understand functionality, e.g. node selectors, taints and tolerations, node affinity, ...
i prior managed a terraform-based deployment of several k8s clusters and it still required knowledge of those keys and values, alongside knowledge of the underlying resource types.
without those you can't implement things like GPU-based node selection for jobs which require a GPU, ...
It is "imperative", not interactive, sorry. From Wiki:
"There are generally two approaches to IaC: declarative (functional) vs. imperative (procedural). The difference between the declarative and the imperative approach is essentially 'what' versus 'how'."
For anyone deliberating between Pulumi and CDK let me recommend what I consider the best of both worlds: CDKTF, Hashicorp’s answer to Pulimi (my quote not theirs).
It’s got everything you want:
- strong type system (TS),
- full expressive power of a real programming language (TS),
- can use every existing terraform provider directly,
- compiles to actual Terraform so you can always use that as an escape hatch to debug any problems or interface with any other tools,
- official backing of Hashicorp so it’s a safe bet
It’s a super power for infra. If you have strong software dev skills and you want to leverage the entire TF ecosystem without the pain of Terraform the language, CDKTF is for you.
Cdktf is good, but it's not amazing. You are still constrained by terraform syntax like `count = condition? 1 : 0` , instead of doing a normal` if` statement. And there's a fairly good amount of times where you need to use terraform iterators instead of doing a normal for/forEach/map/reduce.
But all in all, it works. It's just a bit limited on what you can do with the actual language.
> - full expressive power of a real programming language (TS)
I suppose TypeScript does count as a real programming language, in that it’s Turing complete. But I can use Pulumi from (they claim) any programming language. Specifically, I can use it from Go. Why would I add TypeScript to my project when I can live in one language?
> - official backing of Hashicorp so it’s a safe bet
Given the number of folks leaving the Hashicorp platform, I think it’s arguably no longer a ‘safe bet.’
The Go SDK is a lot more verbose for configuration (plums.String, etc) and then you have error handling boilerplate as well. Exceptions are a better match for creating resources in Pulumi.
Because you can use that to interface with existing tooling. Terraform has a huge and established ecosystem and it’s an uphill battle to compete with it. It’s risky to bet your infra on a tech that tries to drink the ocean and supplant the entire thing. Meanwhile if you compile down to TF you get to use a different language without having to pay the cost of moving out of the tf ecosystem. And given that the language itself is by far the worst thing about terraform that’s a big win.
It turns out terraform is actually quite acceptable when you slap a decent language on top of it. Passable, even :)
We've been migrating off of Terraform at BigCo recently and it has been a tremendous success. The migration has saved countless hours. Before, I was jaded and routinely in the office until 8 or 9 or so manually running terraform deploys for our engineering teams in India. Now, thanks to Pulumi, I'm able to leave the office at 7:30-8 -- and I can tell you single handed that this has saved my relationship with my daughter and maybe even my marriage. I'm running the fastest for loops thanks to Pulumi. We actually compile our Python down to c and use the Pulumi C SDK for insane speed benefits when we loop over our datacenter arrays. Turns out, not having bounds checks shaves off valuable time that I would otherwise be spending with my daughter. Routinely I'd be waking up screaming at 4 in the morning due to Terraform (or, what we would refer to as Tearaform because all of the infra engineers were constantly in tears). Now, I can sleep soundly until 5:30.
Thanks for sharing your story it sounds like you had a really rough time of Terraform.
I don't have much experience running Terraform at scale. What has Pulumi made easier? Why is looping a bottleneck in infrastructure code?
Based on the info I can glean from this story you may be working at a scale / use case that may be too big or a poor fit for Terraform but I'm not sure...
What's your argument here? For example, Typescript allows lots of operations on objects that cannot be known at compile time because it relies on the user to inform it of types accurately, anything can be coerced into anything without complaint with "as", and it allows for arbitrary operations on an "any" type without complaint.
I've heard it referred to it as an "optionally typed" or "gradually typed" system, which, having worked for years in Typescript and other languages like Rust and Kotlin, etc, I agree with.
I wish CDK was fully baked enough to actually use. It's still missing coverage for some AWS services (sometimes you have to do things in cloudformation, which sucks) and integrating existing infra doesn't work consistently. Oh and it creates cloudformation stacks behind the scenes and makes for troubleshooting hell.
CDK is an abomination and I'm not sure why AWS is pushing it now. A few years ago all their Quick Starts were written in CloudFormation, now it's CDK that compiles to CloudFormation. Truly a bad idea.
Just write CloudFormation directly. Once you get the hang of the declarative style and become aware of the small gotchas, it's pretty comfy.
I also had a really rough go with cdk. I personally found the lack of upsert functionality -- you can't use a resource if it exists or create if it doesn't -- to make it way more effort than I felt was useful. Plus a lack of useful error messages... maybe I'm dumb, but I can't recommend it to small companies.
Upserting resources is an antipattern in cloud resource management. The idiom that works best is to declare all the resources you use and own their lifecycle from cradle to grave.
The problem with upserting is that if the resource already exists, its existing attributes and behavior might be incompatible with the state you're declaring. And it's impossible to devise a general solution that safely transitions an arbitrary resource from state A to state A' in a way that is sure to honor your intent.
If you don't mind sharing, suppose (because it's what I was doing) I was trying to create personal dev, staging, and prod environments. I want the usual suspects: templated entries in route53, a load balancer, a database, some Fargate, etc.
If they're all meant to look alike, you'd deploy the stack (or app, in CDK parlance) into your dev, staging, and prod accounts. You'd get the same results in each.
Plain Podman systemd integration is way more powerful and secure, as it does not mess with firewall and allows to run rootless containers using services. It's even possible to run healthchecks and enforce building images just before starting service making on-demand containers using systemd-proxyd possible. Check example: https://github.com/Mati365/hetzner-podman-bunjs-deploy
It looks like you don't even care about opening documentation before pressing reply. Podman is a simple hammer without any moving parts, that used properly can be used to build fancy stuff without much knowledge.
Precisely. I've been implementing some kind of blue-green deployment with both systemd and dockerd, but it was an imperfect and incomplete solution. Kamal put much more effort into it and it seems more convenient and reliable (but I haven't tried it yet in production).
Ah yes my favourite thing to have to do, rolling my own deploys and rollbacks.
It’s stuff like this that’s just a thousand papercuts that dissuades me from using these “simpler” tools. By the time you’ve rebuilt by hand what you need, you’ve just created a worse version of the “more complex” solution.
I get it if your workload is so simple ir low requirement that zero-downtime deploys, rollbacks, health/liveness, automatic volumes, monitoring etc are features you don’t want or need, but “it’s just as good, just DIY all the things” doesn’t make it a viable alternative in my mind.
Sure but Kumal getting all those features means it strays close to Kubernetes in complexity and it quickly because "Why not Kubernetes? At least that is massive popular with a ton of support."
You could certainly implement Kamal just with Ansible and Docker Compose. It's just an abstraction that does it for you and handles all the edge-cases. (Kamal doesn't use Ansible, it has its own SSH lib).
Pulumi genAI-based documentation is trashed. I've moved to terraform and i was able to achieve much better results in shorter time thanks to higher documentation level for terraform.
Worth noting that most of the terraform documentation for classic pulumi providers (providers build on top of TF providers) is still relevant to Pulumi.
Keep an eye on reachability and performance. I’ve seen DO consistently perform terribly and/or drop connections for months (that is, didn’t look like some brief routing glitch somewhere) for some US and Canadian routes (not, like, Sri Lanka or something) on excellent Internet connections. The fix was moving to AWS, problem gone. It felt like a shitty-peering-agreements issue.
From the client side. You can’t know what it should be like without knowing the client.
I’m sure there are lots of DO clients seeing the same things we did, but not realizing it.
We did see it (multiple DCs—we didn’t just not try to fix this before going to AWS) in multiple cases with tens of clients so if there’s good news it’s that if you can monitor like 100 clients distributed over a wide area and all of them behave as expected you may not be experiencing what we did. What we saw was closer to 5% with absurd slowness or frequently-dropped connections than to 0.01%.
And if you are just operating a website and sticking Cloudflare or whatever in front of DO anyway, this doesn’t matter. I expect that’s why it’s not a more widely-reported issue.
Please change the title text unless you add some discussion of the cost differences to the page you linked. However useful your tool is, nothing on this page mentions AWS or costs.
Why's everyone going away from declarative? Terraform, CloudFormation, AWS Copilot etc have a lot of virtues and are programming language agnostic.
Using a complex programming language (C++ of the browser world) just for this has a big switching cost. Unless you're all in on TS. And/or have already built a huge complex IaC tower of babel where programming-in-the-large virtues justify it.
- more imperative background developers need to work with infrastructure and they bring over their mindset and ways of working
- infrastructure is more and more available through API's and it saves a lot of effort to dynamically iterate over cattle than declaratively deal with pets
- things like conditionals, loops and abstractions are very useful for a reason
- in essence the declarative tools are not flexible enough for many use cases or ways of working, using a programming language brings infinite flexibility
Personally I am more in the declarative camp and see the benefits of it, but there is certain amount of banging ones head against it's rigidity.
Complex programming languages for infrastructure code get used when people who are more comfortable using complex programming languages to solve their problems are given the problem of infrastructure and ops.
It is classic "every problem is a nail to the person with a hammer". Complex languages - by definition - can solve a wider variety of problems than a simple declarative language but - by definition - are less simple.
Complex languages for infra - IMO - are the wrong tool for the wrong job because of the wrong skills and the wrong person. The only reason why inefficiencies like this are ever allowed to happen is money.
"Why hire a dev and an ops when we can hire a single devops for fractionally less?" - some excited business person or some broken dev manager, probably.
Declarative has in-practice meant “programming, but in YAML” more often than not, which is hell. YAML’s not even a good format for static data, and it’s even worse when you try to program in it.
Terraform isn't really declarative. It's declarative right up until the point at which it isn't, where it falls apart. I need a declarative deployment right up to the application layer, which is where terraform fails.
It seems to me that there's not a big difference in nr of files. You can have a single template in CF or Terraform files and similarly you can split your CDK code in many files, or not.
(For bigger stuff apparently CF has some limits relating to resoures per single stack)
Just learn CloudFormation. It’s not that hard, and if you really want to write code, you can implement custom resources for all the times the service team let you down.
CDK is a second class citizen, it is missing implementations for many services and features. CDK was DOA as it should have been a requirement that when AWS added something to terraform it needed to be added to CDK as well.
AWS service teams provide cloud formation support before CDK support in many cases, so eventually CDK users run into situations where they need to look at CF
Hetzner has been our "expensive AWS cloud costs" saviour
We've also started switching our custom Docker compose + SSL GitHub Action deployments to use Kamal [1] to take advantage of its nicer remote monitoring features
I’ve been pretty happy with something like Docker Compose or Docker Swarm and Portainer, but honestly it’s nice that there are other alternatives that strive for something manageable and not too complex!
One thing about managing EKS with Pulumi, Terraform, etc. if you deploy things like Istio that makes changes to infrastructure. Do a Terraform destroy - no luck, you are hunting down maybe some security groups or other assets Istio generated that TF doesn't know about. Good times.
CDK APIs in JavaScript are very nice. It's a much much developer experience than Pulumi/Terra form and even Server less Framework.
In our monorepo each service is in a separate folder with a folder called /infrastructure inside with a file called Stack.js that defines all the resources needed. When starting a new service we just copy one of the last similar services that we developed. We are able to deploy a new service in hours. Services are getting better and better with accumulation of nice to have features that you wouldn't have time to add to most services.
My DO K8S cluster ist bugging me every couple of months to do an upgrade. I am always scared to just run it but moving shit over to a new cluster instead is so much work that I simply gamble on it. AWS ECS is worth over penny
DO's K8S is more equivalent to AWS's EKS offering, so of course ECS which abstracts away pretty much all of the other parts of K8s is going to require less maintenance. It's sort of a false equivalence to say ECS == that solution.
On EKS, you need to do the same version updates with the same amount of terror.
You do pay the extra for the further management to just run containers somewhere!
(you might want to say "every" instead of over, "is" instead of "ist")
I definitely want to say is instead of ist but it is bugging me every couple of months. You do the upgrade and 6 months later it needs another one. No LTS in sight
My life on AWS the last five or so years really would have been a lot simpler if every new generation of EC2 servers didn't have the exact same ratio of RAM to cores.
At this point the memory:vcpu ratio is the defining characteristic of main general purpose C/M/R series, I'd think it would be pretty disrupting to change that significantly anymore. And they got also the special extra-high memory X series available. I would say ec2 is pretty flexible in this regard, you have options for 2/4/8/16/32 gigabytes per vcpu. It's mostly problem if you need even less memory than what C series provide, or need some special features.
As products age they tend to use more memory. Add in space/time tradeoffs asking to use more. You either get stuck applying the brakes trying to keep the memory creep at bay, or you give in and jump to 2x the memory pool which will disappear too.
The old solution in on-prem was to populate machines with 2/3 to 3/4 of their max addressable memory and push back on the expensive upgrade as long as possible, or at least until memory prices came down for the most expensive modules. Then faster hard drives or new boxes are the next step.
You don't choose EKS because it's easy to manage. You choose it because you intend to use the bevy of other AWS hosted services. The clusterfuck of management is directly related to that.
The alternative, which I feel is far too common (and I say this as someone who directly benefits from it): You choose AWS because it's a "Safe" choice and your incubator gets you a bunch of free credits for a year or two. You pay nothing for compute for the first year, but instead pay a devops guy a bunch to do all the setup - In the end it's about a wash because you have to pay a devops guy to handle your CI and deploy anyway, you're just paying a little more in the latter.
What's your issue with EKS? I operate several very simple and small single-tenant clusters, and I have to touch the infrastructure only once a year for updates
I wouldn't even use DO for that, unless it's like a private server for just your friends.
I won't touch DO after they took my droplet offline for 3 hours because I got DDoS'd by someone that was upset that I banned them from an IRC channel for spamming N-bombs and other racial slurs.
While yes, it was more than ten years ago, we can see that such stupidity is woven into their DNA as a company.
TL;DR: where a cloud provider hosts customers for which there are real-world consequences for data leakage, not a single customer can be at-risk for data leakage. It's a different line of thinking, almost "a different world", to those who have this line of thinking vs those who do.
"The thing about reputations is you only have one".
By contrast even more than ten years before that, AWS was publishing whitepapers about how all contents of RAM to be used by a VM are initialized before a VM is provisioned, and other efforts to proactively scrub customer data.
I worked at a niche cloud provider a bit over ten years ago. We used Intel QAT for client-side encryption for our network attached pools of SSD. We were able to offer all-SSD at low cost and without security blindspots by crypto key rotation implemented by compartmentalized teams and also physical infrastructure compartmentalization patterns. Which, about half a decade later we found we were second only to AWS and almost second (but ahead of in other ways) to some smaller cloud-style hosting provider.
> While yes, it was more than ten years ago, we can see that such stupidity is woven into their DNA as a company.
I don't know if it really meets that bar, but I won't argue about that right now. I'm just going to ask again for your definition of "real cloud" and whether you can suggest some that don't price gouge bandwidth (and aren't oracle, I would not consider them worthy of trust either).
Pulumi is really a royal piece of shit. Why the f*ck am I writing code to do "deployment". In C# --> new Dictionary<string, object> when dealing with a values.yaml for instance. The whole need to figure out when and when not to use Apply.
Give me Terraform (as much as I hate it) any day.
As SRE dealing with former Pulumi, "Hey Devs can use code to deploy infrastructure" is not great idea you think it is. I've seen some real ugly conditional behavior where I'm like "Is this or is this not going to run? I honestly can't tell."
We had so much conflict with the ops team over their choice of Terraform. The three colors of variable thing is just fucking bonkers. Getting tests wrapped around it that actually did what we thought they meant was a giant pain in the ass.
I won't go as far as to say we burned bridges arguing back and forth about it but they were definitely significantly singed.
Config files simply don't work until they do. And if it's your job to stare at them for hours and hours a day then maybe that's okay with you, but if you expect other people to 'just learn' it you're an idiot or an asshole. Or both. Ain't nobody got time for magic incantations.
I also think it should tell you you're on the wrong path when your app is named after a verb and the data it deals with is all declarative.
Ever thought that "Ops" needs a different mindset than the devs are used to ?
And that’s why we don’t delegate that work to devs.
Honestly, the culture/org structure is a way bigger problem in this story than any proper noun tool.
If you’re ignoring guidance and patterns and getting mad reinventing the wheel, that’s on dev. If “ops” mandates tooling and doesn’t have any skin in the game, that’s on them. And both problems are on your leadership.
If y’all just hate each other and don’t listen or participate, then you can’t be successful. It is ironic that this is the pattern that the devops movement landed us in.
Honestly curious, I've been writing terraform for a while but I have never heard of "The three colors of variable thing". Could you expand on that?
They mean var vs local vs from-a-resource. There are some places you can’t use some types of variables. It can be annoying but it’s not really a huge problem if you design your approach with that in mind.
The worst part is that the Terraform team at Hashicorp often excuse not fixing these design issues as “safety measures” which isn’t entirely untrue but when over half of your users want something, sometimes you should get over yourself.
For what it’s worth, OpenTofu is fixing many of these sorts of things that cause people pain.
But my advice is to learn to use the tool. Terraform has such great benefits (in the right use cases). If you’re struggling, either you are missing something or you chose the wrong tool for your particular job. Either way, don’t gripe that this specialized tool for infra management doesn’t work exactly like every other general purpose programming language.
That makes sense I guess, I just never considered locals or data resources as variables.
Same, locals are in my head like consts. You define it and it stays that way. A shortcut for a repeated value.
Data resources are you requesting a dynamic value of your environment.
Variables are dynamic values that a user can change.
That’s only the case if you spend all day rerunning deployments. If your task is more frequently to transition the cluster config from A -> B then the distinction blurs and you go from a 10:1 delta ratio of the different classes of state to maybe 3:2, at which point it feels like splitting hairs.
Especially if the locals vary between prod and pre-prod, and worse if dev sandboxes end up with per-user instances, which for us was mercifully only needed for people working on the TF scripts, so we could run our tests locally.
We have multiple separate environments per application. For environment specific inputs we use variables.
The distinction is very clear in our team. Locals are used as const (like an application name), variables are for more dynamic user/environment inputs and data is to fetch dynamic information from other resources.
Zero problems. If a local becomes more environment specific a quick refactor fixes that. You can also have locals that use variable or data values if necessary.
One big win we also have is that we stopped using modules except for one big main module. We noticed from previous projects that as soon as we implemented modules everything became a big problem. Modules that are version pinned still required a lot of maintenance to upgrade. Modules that weren't version pinned caused more destruction than we planned. Modules outputs and inputs caused a lot of cycle problems,... Modules always seem too deep or too shallow.
So would you go opentofu or pulumi or Sir Not Appearing in This Film?
Seconded, as someone that really does developer / operations, depending on the project assignment, I have learned the hard way that infrastructure configuration code should be as declarative as possible.
Sure "use code to deploy infrastructure" sounds great, and that is why we get stuff like Ant, Gradle, Pulumi, Jenkins Groovy scripts, .NET Aspire,.... until someone has to debug spaghetti code on a broken deployment.
On the flip side dsl declarative stuff is obfuscated magic that you can't step through or drive into.
a dsl like SQL involves one basic substrate (data organized in tables) that you can compile in your head. But declarative infra as code involves a thousand different things across a dozen different clouds.
Declarative will hold off spaghetti for... A bit. But it devolves to spaghetti as well (think fine grained acls, or places where order of operations, which the dsl does not specify and is magically resolved, becomes ambiguous).
And if you need to go off the reservation (dsl support doesn't exist or is immature for rapidly evolving platforms, need some custom postprocess steps) then you are... What?
Probably writing code and scripts to autoinvoke on the new node, phone home to a central.... Yup that's code.
Finally, declarative code has an implicit execution loop. But for something like iac that is a very complicated, the execution loop that isn't well documented. And some committed changes to declarative code May trigger a destructive pass followed by a possibly broken constructive phase.
It's a tough problem.
I would agree with you, if HCL wasn't a bad language in itself:
* You can't make have variables in an import block (for example, to specify a different "id" value for each workspace)
* There is no explicit way to make a resource conditional based on variables. Only a hacky way to do that using "count = foo ? 1 : 0"
* You can't have variables in the backend configuration, making it impossible to store states in different places depending on the environment.
* You can't have variables in the "ignore_changes" field of a resource, making it impossible to dynamically ignore changes for a field (for example, based on module variables).
* The VSCode extension for HCL is slow and buggy. Using TS with pulumi or TFCDK makes it possible to use all the existing tooling of the language.
For Terraform, most of the issues with conditionals can be resolved by creating dictionaries dynamically and looping through it to generate resources.
You get the bonus of controlling the resource id and being able to selectively delete resources without worrying about ordering.
This massively depends on your provider code. Using loops to manage tf stuff can you you into really “fun” scenarios when you want to e.g delete an openstack firewall rule from the middle of the array.
I’ve been burned so many times here that I hate all of this stuff with an extreme passion.
Crossplane seems to be a genuinely better way out but there are big gotchas there also like resources that can simply never be deleted
As much as I like it, I find C# to be too inflexible of a language for infrastructure code. I tried with Pulumi for a while but moved to TypeScript as it works so much better. Structural typing makes your life a lot easier.
I bounce back and forth between javascript and C# depending on the nature of the job at hand. I'm curious what things you'd like to do with C# that you can't?
I find that with some handwringing, C# can be forced to do almost anything. between extension methods, dispatch proxies and reflection you can pummel it into basically any shape.
Having to write a little boilerplate to make it happen can be a drag though. I do sometimes wish C# had something from a blank project that let me operate with as much reckless abandon as Object.assign does in js land.
It's not the fault of the language, it's just the nature of infrastructure code that's been ported from terraform. With Pulumi C# you end up with multiple nested objects/dictionaries with a load of `new` object calls that just add noise to your codebase. There's also some pain points with some types being Input<T> which IDEs try to autocomplete when in reality you need to call `new T()`. Typescript permits structural typing that _feels_ a lot better to write and read within this context.
I use C# extensively for most other things I do, but this the one area where I prefer not to use it.
> Give me Terraform (as much as I hate it) any day
Terraform sure is a quirky little DSL ain’t it? It’s so weirdly verbose.
But at the same time I can create some azure function app, setup my GitHub build pipeline, get auth0 happy and in theory hook up parts of stripe all in one system. All those random diverse API’s plumbed together and somehow it manages to work.
But boy howdy is that language weird.
I haven't used Terraform in years (because I changed jobs, not because of the tech itself), but back in the day v0.12 solved most of my gripes. I have always wished they'd implement a better "if" syntax for blocks, because the language itself pseudo-supports it: https://github.com/hashicorp/terraform/issues/21512
But yeah, at $previous_job, Terraform enabled some really fantastic cross-SaaS integrations. Stuff like standing up a whole stack on AWS and creating a statuspage.io page and configuring Pingdom all at once. Perfect for customers who wanted their own instance of an application in an isolated fashion.
We also built an auto-approver for Terraform plans based on fingerprinting "known-good" (safe to execute) plans, but that's a story for a different day.
I get around most of the if stuff using "for each" to iterate over a map. That map might be config (usually from the hiera data provider) or the output of another deployment. It's not generally a very flexible "if" that you need most of the time, it's more like "if this thing exists then create an X for it", or "while crafting X turn this doohickey on of that data set has this flap", which can be accomplished my munging together days with a locals var for loop (which support if statements).
Honestly, I only use terraform with hiera now, so I pretty much only write generic and reusable "wrapper" modules that accept a single block of data from Hiera via var.config. I can use this to wrap any 3rd party module, and even wrote a simple script to wrap any module by pointing at its git project.
That probably scares the shit out of folks who do the right thing, and use a bunch of vars with types and defaults. But it's so extremely flexible and it neutered all of the usual complexity and hassle I had writing terraform. I have single handedly deployed an entire infrastructure via terraform like this, from DNS domains up through networking, k8s clusters, helm charts and monitoring stack (and a heap of other AWS services like API Gateway, SQS, SES etc). The beauty of removing all of the data out to Hiera is that I can deploy new infra to a new region in about an 2 hours, or deploy a new environment to an existing region in about 10 minutes. All of that time is just waiting for AWS to spin things up. All I have to do in code is literally "cp -a eu-west-1/production eu-west-2/production" and then let all of the "stacks" under that directory tree deploy. Zero code changes, zero name clashes, one man band.
The hardest part is sticking rigidly to naming conventions and choosing good ones. That might seem hard because cloud resources can have different naming rules or uniqueness requirements. But when you build all of your names from a small collection of hiera vars like "%{product}-%{env}-%{region}-uploads", you end up with something truly reusable across any region, environment and product.
I'm pretty sure there's no chance I'd be able to do this with Pulumi.
Tip for naming, create a naming module where you pass in stuff like product, environment, region, service, have a bunch of locals for each thing like S3 bucket, RDS, EC2, EKS whatever you use then make them all outputs.
So at top of your IaC, you have module naming {variables as inputs} then all other resources are aws_s3 { name = module.naming.s3bucket }
In pulumi
Of course Pulumi can do for loops, you're using a proper programming language.
I meant that I doubt that I could 'cp -a' on a whole deployment tree, and deploy the copy successfully without having to make any code changes.
Although thinking about it, I take it back. It may be possible with Pulumi with the right code structure and naming conventions, and if configuration were separated entirely from the codebase, and if variables were inferred from the directory structure. That is really the thing that allows me do to it.
Yes, sorry for the rather pithy response, but separating out the "what changes" vs. "what doesn't" (config vs. code in your terms) is what makes these things possible.
As you also noted, doing this in plain terraform is kind of a pain, so using a tool like Hiera allows you to skip a lot of the work involved in doing it the "right" way. IMO if you're starting greenfield Pulumi (or CDK, anything that lets you use a "real" programming language) allows you to write (or consume!) that config in basically any form, instead of needing to funnel everything through a Terraform data provider.
Yeah. I guess maybe terraform makes sense if the people writing it spend enough of their time writing HCL to master it, but I ported our terraform config to Pulumi a few years ago and never looked back. It meant I could spend way less time googling for the HCL way to do something (say, templated resource) and just use the JS primitives I already know.
>spend enough of their time writing HCL to master it
Making Terraform changes every six weeks was enough time that we forgot everything and had to refresh our memories. Every time it felt like going into the water in a northern beach and forgetting how goddamned cold the water was, then reproaching yourself for forgetting.
Why are people templating yaml for terraform like they templated html in php in 1996?
Because it works fine, and is also used in for other things like Helm Charts?
https://helm.sh/docs/chart_template_guide/control_structures...
Helm charts are a horrible example of text based templating.
You have YAML/JSON that k8s API wants, that is fed through helm which is fed through helmsman or whatever newer thing. There might be a layer or two of other templating around. Sometimes companies have built systems so developers/devops don't even have the ability to see what the final compiled version of the template is which is like the mother of all: "works on my laptop" problems.
It's super easy to break text based templating because of some space, tab, string escaping or whatever.
YAML makes it worse as there are lots of gotchas and different ways of doing. JSON, being quite verbose and inflexible at least has strong structure right in your face so it's a bit easier to figure out what went wrong.
With a proper programming language data structure you can be much better with verifying that the things you add or remove or iterate over will produce a valid result, much better refactoring and working as a team independently.
> Helm charts are a horrible example of text based templating.
Every time I see " | nindent whatever" I'm asking why the fuck the tool cannot manage indentation.
And it breaks every time a variable gets a `:` inside of it and now you are producing invalid yaml everywhere you forgot to call `| toYaml`.
I once got a nil pointer exception when I updated a helm chart. I wondered why the hell am I getting a nil pointer exception for updating a YAML file. After some investigation I found an issue on GitHub where the maintainers said the Go team says this is an intended behavior for some case in Go templates.
Wasn't fun.
That isn't a typical nill/null exception, like in JavaScript, ruby, and python. That's in a language where a lot of values are non-nullable, and some of the ones that are have zero-values that can be used without getting a nil pointer exception. https://go.dev/tour/moretypes/12
So, there's a good chance was an error that was really unexpected and it's better to show the error than to risk producing bad output.
Never read anything more true in my life!
I’m not sure why nobody invented a way to dynamically update values.yaml based on what are writing in the template file. And maybe vice-versa. It would be such a time saver. Maybe someone did, but I didn’t find it yet.
This
Tried Pulumi thinking "it's gonna abstract all the k8s specifics". Welp no, still need to know and understand K8s so I still don't see the value from those kind of tools. In which case why not use something like Pkl to generate my yaml from some sensible code-like structures?
kubernetes is very complex and therefore any abstraction which completely glosses over the way the underlying systems work would make it very hard to avoid leaking or a bad abstraction to begin with.
the complexity in one way or another must be preserved within the abstraction (in all likelihood) or you will have cases you cannot create in that layer or breakages which now have the total complexity of both the abstraction itself AND kubernetes itself required to fix.
i would not say IaC is going to provide you a magic solution to learning k8s, although the value in using IaC (e.g. Argo CD / Flux CD + Kustomize + ...) in K8s land is that you are no longer imperatively managing your cluster resources and therefore can keep them within a repository, managed like code. the point of the solution is not to make it easier for newcomers, but to make it easier to have teams manage and work together on an established cluster for deployments, ...
in the case of Pulumi, you leverage the single language with typechecking instead of relying upon K8s flavoured YAML, which is itself beneficial in many ways (since you can use your regular developer tooling)
wrt pkl, pretending K8s manifest structure underneath does not help because you will need to know how the keys within a manifest interact with the underlying system regardless, especially to understand functionality, e.g. node selectors, taints and tolerations, node affinity, ...
i prior managed a terraform-based deployment of several k8s clusters and it still required knowledge of those keys and values, alongside knowledge of the underlying resource types.
without those you can't implement things like GPU-based node selection for jobs which require a GPU, ...
What about pulumi's declarative yaml interface which can be exported from type-safe languages like cue? https://www.pulumi.com/blog/extending-pulumi-languages-with-...
> Give me Terraform (as much as I hate it) any day.
Just use CloudFormation. Easy to write, declarative, vars (Parameters and Output exports). Trick is not to pile everything in one Stack. Use several.
CDK is much better to express this. Why cfn?
Less lines, easier to read, declarative (cdk is interactive, less predictable).
And it generates shitty CFN, we can do better ourselves :)
How is cdk interactive? I use cdk and have it auto build and deploy.
It is "imperative", not interactive, sorry. From Wiki:
"There are generally two approaches to IaC: declarative (functional) vs. imperative (procedural). The difference between the declarative and the imperative approach is essentially 'what' versus 'how'."
https://en.wikipedia.org/wiki/Infrastructure_as_code#Types_o...
Apply is really straightforward. The dictionary stuff is very annoying overhead but it’s nice keeping everything in one language.
For anyone deliberating between Pulumi and CDK let me recommend what I consider the best of both worlds: CDKTF, Hashicorp’s answer to Pulimi (my quote not theirs).
It’s got everything you want:
- strong type system (TS),
- full expressive power of a real programming language (TS),
- can use every existing terraform provider directly,
- compiles to actual Terraform so you can always use that as an escape hatch to debug any problems or interface with any other tools,
- official backing of Hashicorp so it’s a safe bet
It’s a super power for infra. If you have strong software dev skills and you want to leverage the entire TF ecosystem without the pain of Terraform the language, CDKTF is for you.
(No affiliation)
https://developer.hashicorp.com/terraform/cdktf
Cdktf is good, but it's not amazing. You are still constrained by terraform syntax like `count = condition? 1 : 0` , instead of doing a normal` if` statement. And there's a fairly good amount of times where you need to use terraform iterators instead of doing a normal for/forEach/map/reduce.
But all in all, it works. It's just a bit limited on what you can do with the actual language.
> - full expressive power of a real programming language (TS)
I suppose TypeScript does count as a real programming language, in that it’s Turing complete. But I can use Pulumi from (they claim) any programming language. Specifically, I can use it from Go. Why would I add TypeScript to my project when I can live in one language?
> - official backing of Hashicorp so it’s a safe bet
Given the number of folks leaving the Hashicorp platform, I think it’s arguably no longer a ‘safe bet.’
The Go SDK is a lot more verbose for configuration (plums.String, etc) and then you have error handling boilerplate as well. Exceptions are a better match for creating resources in Pulumi.
How is compiling to terraform a positive? I'd rather debug python than python-compiled-to-terraform.
Because you can use that to interface with existing tooling. Terraform has a huge and established ecosystem and it’s an uphill battle to compete with it. It’s risky to bet your infra on a tech that tries to drink the ocean and supplant the entire thing. Meanwhile if you compile down to TF you get to use a different language without having to pay the cost of moving out of the tf ecosystem. And given that the language itself is by far the worst thing about terraform that’s a big win.
It turns out terraform is actually quite acceptable when you slap a decent language on top of it. Passable, even :)
Makes sense! Except for one little thing..
We've been migrating off of Terraform at BigCo recently and it has been a tremendous success. The migration has saved countless hours. Before, I was jaded and routinely in the office until 8 or 9 or so manually running terraform deploys for our engineering teams in India. Now, thanks to Pulumi, I'm able to leave the office at 7:30-8 -- and I can tell you single handed that this has saved my relationship with my daughter and maybe even my marriage. I'm running the fastest for loops thanks to Pulumi. We actually compile our Python down to c and use the Pulumi C SDK for insane speed benefits when we loop over our datacenter arrays. Turns out, not having bounds checks shaves off valuable time that I would otherwise be spending with my daughter. Routinely I'd be waking up screaming at 4 in the morning due to Terraform (or, what we would refer to as Tearaform because all of the infra engineers were constantly in tears). Now, I can sleep soundly until 5:30.
Thanks for sharing your story it sounds like you had a really rough time of Terraform.
I don't have much experience running Terraform at scale. What has Pulumi made easier? Why is looping a bottleneck in infrastructure code?
Based on the info I can glean from this story you may be working at a scale / use case that may be too big or a poor fit for Terraform but I'm not sure...
I think he's kidding... there's no C CDK:
https://www.pulumi.com/docs/iac/languages-sdks/
In an AWS scenario I can think of:
Pro vs pulumi: you get a declarative template to debug and review
Pro vs CDK: The declarative template is applied via APIs instead of CloudFormation. The CDK CloudFormation abstraction leaks like hell
Does Typescript offer a strong type system?
Yes
What's your argument here? For example, Typescript allows lots of operations on objects that cannot be known at compile time because it relies on the user to inform it of types accurately, anything can be coerced into anything without complaint with "as", and it allows for arbitrary operations on an "any" type without complaint.
I've heard it referred to it as an "optionally typed" or "gradually typed" system, which, having worked for years in Typescript and other languages like Rust and Kotlin, etc, I agree with.
Pretty easy to add runtime validation at the edges with Zod https://github.com/colinhacks/zod
Great thing is that the zod schema also doubles as your typescript type so you don't have to write a duplicate/shadow TS type definition.
That doesn't make Typescript as a language "strongly typed".
I wish CDK was fully baked enough to actually use. It's still missing coverage for some AWS services (sometimes you have to do things in cloudformation, which sucks) and integrating existing infra doesn't work consistently. Oh and it creates cloudformation stacks behind the scenes and makes for troubleshooting hell.
> sometimes you have to do things in cloudformation, which sucks
All of CDK does things in cloudformation, which made the whole thing stillborn as far as I’m concerned.
The CDK team goes to some lengths to make it better, but it’s all lambda based kludges.
so like every other aws "solution"
CDK is an abomination and I'm not sure why AWS is pushing it now. A few years ago all their Quick Starts were written in CloudFormation, now it's CDK that compiles to CloudFormation. Truly a bad idea.
Just write CloudFormation directly. Once you get the hang of the declarative style and become aware of the small gotchas, it's pretty comfy.
> Just write CloudFormation directly. Once you get the hang of the declarative style and become aware of the small gotchas, it's pretty comfy.
Exactly this. And don't make huge templates, split stuff logically to several stacks and pass vars via export/importvalue.
The biggest hurdle I've encountered is cross-stack resource sharing, especially in case of bidirectional dependencies like KMS keys and IAM roles.
The biggest hurdle is when you want to refactor your stacks, and you pretty well just can't, without risk of deleting everything
> you pretty well just can't, without risk of deleting everything
This is one hyper annoying area.
It is possible to get around it, but it's ugly, drop to L1 and override logical id:
You have to do this literally for every resource that's refactored.For us, we run 2 stacks. One that basically cannot/should-not be deleted/refactored. VPC, RDS, critical S3 buckets - i.e. critical data.
The 2nd stack runs the software and all those resources can be destroyed, moved whatever w/o any data loss.
+1 CDK refactoring is annoying and ugly
in my experience you'd need to read the CDK source code to find the offending node and call `overrideLogicalId`
there is a library to do it in nicer way: https://github.com/mbonig/cdk-logical-id-mapper
however it does not work in every case
> we run 2 stacks. One that basically cannot/should-not be deleted/refactored. VPC, RDS, critical S3 buckets
Why, dear god, you put VPC and RDS in one stack? They are much better off as separate CFN stacks.
There are deletion protection flags that can be enabled.
But circular dependencies can also lead to issues here where CDK will prevent you from deleting a resource used or referenced by a different stack.
I also had a really rough go with cdk. I personally found the lack of upsert functionality -- you can't use a resource if it exists or create if it doesn't -- to make it way more effort than I felt was useful. Plus a lack of useful error messages... maybe I'm dumb, but I can't recommend it to small companies.
Upserting resources is an antipattern in cloud resource management. The idiom that works best is to declare all the resources you use and own their lifecycle from cradle to grave.
The problem with upserting is that if the resource already exists, its existing attributes and behavior might be incompatible with the state you're declaring. And it's impossible to devise a general solution that safely transitions an arbitrary resource from state A to state A' in a way that is sure to honor your intent.
Hmm.
If you don't mind sharing, suppose (because it's what I was doing) I was trying to create personal dev, staging, and prod environments. I want the usual suspects: templated entries in route53, a load balancer, a database, some Fargate, etc.
What are you meant to do here? Thank you.
If they're all meant to look alike, you'd deploy the stack (or app, in CDK parlance) into your dev, staging, and prod accounts. You'd get the same results in each.
Cant use bun to deploy CDK, CDK fails as it looks for package-lock yarn-lock or pnpm’s exclusively
So dumb. Trying to move to SST for only that reason
but if you add cdk to the path, you can still deploy, its just that your cicd and deployment scripts are not all using bun anymore
Hmm, beyond a bug they had in bun between version 1.0.8 and 1.1.20[0] bun has otherwise worked perfectly fine for me
You have to do a few adjustments which you can see here https://github.com/codetalkio/bun-issue-cdk-repro?tab=readme...
- Change app/cdk.json to use bun instead of ts-node
- Remove package-lock.json + existing node_modules and run bun install
- You can now use bun run cdk as normal
[0]: https://github.com/codetalkio/bun-issue-cdk-repro
mmm, I wonder how hard that would be to fix in a PR.
actually good idea, didnt think about it
Kubernetes no thanks. Terraform + Kamal [1] on Digital Ocean is the way I deploy/run apps now.
[1] https://kamal-deploy.org/
Plain Podman systemd integration is way more powerful and secure, as it does not mess with firewall and allows to run rootless containers using services. It's even possible to run healthchecks and enforce building images just before starting service making on-demand containers using systemd-proxyd possible. Check example: https://github.com/Mati365/hetzner-podman-bunjs-deploy
> way more powerful and secure
I don't care about powerful. That's the opposite of what I want. I could just use k8s if I cared about that.
It looks like you don't even care about opening documentation before pressing reply. Podman is a simple hammer without any moving parts, that used properly can be used to build fancy stuff without much knowledge.
I'm aware of what Podman and Systemd are. Apparently you are not aware of what Kamal is. Open documentation, then press reply.
Be nice folks, we are all here to learn :)
Does it support zero downtime deploys?
Why not? Install trafeik or any other load balancer, setup two services, and restart one after one.
https://kamal-deploy.org/docs/configuration/proxy/
I think GP's point was that Kamal has all of these things already, so you don't have to set them up.
Precisely. I've been implementing some kind of blue-green deployment with both systemd and dockerd, but it was an imperfect and incomplete solution. Kamal put much more effort into it and it seems more convenient and reliable (but I haven't tried it yet in production).
Ah yes my favourite thing to have to do, rolling my own deploys and rollbacks.
It’s stuff like this that’s just a thousand papercuts that dissuades me from using these “simpler” tools. By the time you’ve rebuilt by hand what you need, you’ve just created a worse version of the “more complex” solution.
I get it if your workload is so simple ir low requirement that zero-downtime deploys, rollbacks, health/liveness, automatic volumes, monitoring etc are features you don’t want or need, but “it’s just as good, just DIY all the things” doesn’t make it a viable alternative in my mind.
Sure but Kumal getting all those features means it strays close to Kubernetes in complexity and it quickly because "Why not Kubernetes? At least that is massive popular with a ton of support."
I disagree. An opinionated tool can be as powerful as, but much simpler than a generic tool.
Kamal is doing most of this, but on a single node. This is the limitation that differentiates it from k8s, but also makes it much simpler.
I've looked into Kamal but it feels so "It's as complex as Kubernetes but isn't so support is going to be nightmarish."
Why is this better then Ansible + Docker Compose?
You could certainly implement Kamal just with Ansible and Docker Compose. It's just an abstraction that does it for you and handles all the edge-cases. (Kamal doesn't use Ansible, it has its own SSH lib).
Technically, it’s not much different from using Ansible to run Docker on remote hosts.
What it provides is a set of conventions based on what most web apps look like.
Eg. built-in proxy with automatic TLS and zero downtime deployments, first-class support for a DB and cache, encrypted secrets, etc.
It’s definitely not for every use case, but for your typical 3-tier monolith on a handful of servers I found it does the job well.
Kamal is simply NIH K8s made by an unreliable company with poor leadership. No thanks, not for my prod infra!
Pulumi genAI-based documentation is trashed. I've moved to terraform and i was able to achieve much better results in shorter time thanks to higher documentation level for terraform.
Worth noting that most of the terraform documentation for classic pulumi providers (providers build on top of TF providers) is still relevant to Pulumi.
Hi everyone,
We've gone through a lot of pain to get this blueprint working since our AWS costs were getting out of hand but we didn't want to part ways with CDK.
We've now got the same stack structure going with Pulumi and Digital ocean, having the same ease of development with at least 60% cost reduction.
Keep an eye on reachability and performance. I’ve seen DO consistently perform terribly and/or drop connections for months (that is, didn’t look like some brief routing glitch somewhere) for some US and Canadian routes (not, like, Sri Lanka or something) on excellent Internet connections. The fix was moving to AWS, problem gone. It felt like a shitty-peering-agreements issue.
People will pretend that this quality difference doesn’t exist in networking, uptime, server quality.
It’s not a drop in replacement. It might be worth it depending on what you’re doing.
Frustratingly, it’s also something that doesn’t meaningfully appear on any features list or comparison sheet.
How do you monitor the connection quality?
From the client side. You can’t know what it should be like without knowing the client.
I’m sure there are lots of DO clients seeing the same things we did, but not realizing it.
We did see it (multiple DCs—we didn’t just not try to fix this before going to AWS) in multiple cases with tens of clients so if there’s good news it’s that if you can monitor like 100 clients distributed over a wide area and all of them behave as expected you may not be experiencing what we did. What we saw was closer to 5% with absurd slowness or frequently-dropped connections than to 0.01%.
And if you are just operating a website and sticking Cloudflare or whatever in front of DO anyway, this doesn’t matter. I expect that’s why it’s not a more widely-reported issue.
Please change the title text unless you add some discussion of the cost differences to the page you linked. However useful your tool is, nothing on this page mentions AWS or costs.
I don’t think Digital Ocean is all that much better for pricing, but using Pulumi over CDK is a pure win as far as I’m concerned.
Agreed. On the bright side, I was able to migrate managed k8s on DO to managed k8s in GCP with very minimal work since it was managed via pulumi.
Yeah, I've been really disappointed with Digital Ocean so far. Not just from a pricing perspective but from a customer service perspective.
Anyone using CDK should switch to Pulumi though.
Perhaps Pulumi with Vultr is also worth a look.
Why's everyone going away from declarative? Terraform, CloudFormation, AWS Copilot etc have a lot of virtues and are programming language agnostic.
Using a complex programming language (C++ of the browser world) just for this has a big switching cost. Unless you're all in on TS. And/or have already built a huge complex IaC tower of babel where programming-in-the-large virtues justify it.
> Why's everyone going away from declarative?
If I had to guess it's because
- more imperative background developers need to work with infrastructure and they bring over their mindset and ways of working
- infrastructure is more and more available through API's and it saves a lot of effort to dynamically iterate over cattle than declaratively deal with pets
- things like conditionals, loops and abstractions are very useful for a reason
- in essence the declarative tools are not flexible enough for many use cases or ways of working, using a programming language brings infinite flexibility
Personally I am more in the declarative camp and see the benefits of it, but there is certain amount of banging ones head against it's rigidity.
Complex programming languages for infrastructure code get used when people who are more comfortable using complex programming languages to solve their problems are given the problem of infrastructure and ops.
It is classic "every problem is a nail to the person with a hammer". Complex languages - by definition - can solve a wider variety of problems than a simple declarative language but - by definition - are less simple.
Complex languages for infra - IMO - are the wrong tool for the wrong job because of the wrong skills and the wrong person. The only reason why inefficiencies like this are ever allowed to happen is money.
"Why hire a dev and an ops when we can hire a single devops for fractionally less?" - some excited business person or some broken dev manager, probably.
Declarative has in-practice meant “programming, but in YAML” more often than not, which is hell. YAML’s not even a good format for static data, and it’s even worse when you try to program in it.
Terraform isn't really declarative. It's declarative right up until the point at which it isn't, where it falls apart. I need a declarative deployment right up to the application layer, which is where terraform fails.
Because they like to spend endless hours debugging infrastructure builds.
A small CDK project is a lot more readable in my opinion. It doesn’t have a ton of yml files where your config is spread out
It seems to me that there's not a big difference in nr of files. You can have a single template in CF or Terraform files and similarly you can split your CDK code in many files, or not.
(For bigger stuff apparently CF has some limits relating to resoures per single stack)
Because sometimes you just need a for loop in a way that terraform's for_each/other DSL doesn't support
declarative does not equate to config files
the property that equates to config files is "being static", which modern deployments are not.
Controversial opinion here: just use CDK. Learn cloud formation for advanced stuff. It’s really not that hard and pays dividends
Just learn CloudFormation. It’s not that hard, and if you really want to write code, you can implement custom resources for all the times the service team let you down.
CDK is a second class citizen, it is missing implementations for many services and features. CDK was DOA as it should have been a requirement that when AWS added something to terraform it needed to be added to CDK as well.
In my experience AWS' CloudFormation is limited in the number of resources and exposed APIs than any of the CDK.
AWS service teams provide cloud formation support before CDK support in many cases, so eventually CDK users run into situations where they need to look at CF
Hetzner has been our "expensive AWS cloud costs" saviour
We've also started switching our custom Docker compose + SSL GitHub Action deployments to use Kamal [1] to take advantage of its nicer remote monitoring features
[1] https://kamal-deploy.org
I’ve been pretty happy with something like Docker Compose or Docker Swarm and Portainer, but honestly it’s nice that there are other alternatives that strive for something manageable and not too complex!
One thing about managing EKS with Pulumi, Terraform, etc. if you deploy things like Istio that makes changes to infrastructure. Do a Terraform destroy - no luck, you are hunting down maybe some security groups or other assets Istio generated that TF doesn't know about. Good times.
This title text is nowhere on the linked page. Please get rid of the editorialization. DO is not that much cheaper for a baseline instance.
Pulumi is very neat with straight AWS, too. I suspect this is the primary use case.
CDK APIs in JavaScript are very nice. It's a much much developer experience than Pulumi/Terra form and even Server less Framework. In our monorepo each service is in a separate folder with a folder called /infrastructure inside with a file called Stack.js that defines all the resources needed. When starting a new service we just copy one of the last similar services that we developed. We are able to deploy a new service in hours. Services are getting better and better with accumulation of nice to have features that you wouldn't have time to add to most services.
This doesn’t sound good to me. Would you do the same with some functional code rather than creating an external versioned library?
Terraform or CDK I would want a simple shareable thing that did the boilerplate that I called with any variables I needed to change.
My DO K8S cluster ist bugging me every couple of months to do an upgrade. I am always scared to just run it but moving shit over to a new cluster instead is so much work that I simply gamble on it. AWS ECS is worth over penny
DO's K8S is more equivalent to AWS's EKS offering, so of course ECS which abstracts away pretty much all of the other parts of K8s is going to require less maintenance. It's sort of a false equivalence to say ECS == that solution.
On EKS, you need to do the same version updates with the same amount of terror.
You do pay the extra for the further management to just run containers somewhere!
(you might want to say "every" instead of over, "is" instead of "ist")
I definitely want to say is instead of ist but it is bugging me every couple of months. You do the upgrade and 6 months later it needs another one. No LTS in sight
It’s only “insane costs” if you don’t know what you’re doing.
Or need a good amount of ram. Which should be really cheap these days.
My life on AWS the last five or so years really would have been a lot simpler if every new generation of EC2 servers didn't have the exact same ratio of RAM to cores.
At this point the memory:vcpu ratio is the defining characteristic of main general purpose C/M/R series, I'd think it would be pretty disrupting to change that significantly anymore. And they got also the special extra-high memory X series available. I would say ec2 is pretty flexible in this regard, you have options for 2/4/8/16/32 gigabytes per vcpu. It's mostly problem if you need even less memory than what C series provide, or need some special features.
As products age they tend to use more memory. Add in space/time tradeoffs asking to use more. You either get stuck applying the brakes trying to keep the memory creep at bay, or you give in and jump to 2x the memory pool which will disappear too.
The old solution in on-prem was to populate machines with 2/3 to 3/4 of their max addressable memory and push back on the expensive upgrade as long as possible, or at least until memory prices came down for the most expensive modules. Then faster hard drives or new boxes are the next step.
RAM in cloud is expensive because it's the only thing still not possible to over-provision performantly afaik.
and even if you do, it’s usually a system design problem that you’re maintaining
on one hand, I can see how this is an unfalsifiable standard, on the other hand I can see the utility of solving a friction for people that messed up
EKS has become a clusterf*ck to manage and provision. This looks very useful. Bare metal k8s, even running on EC2, might be another option.
You don't choose EKS because it's easy to manage. You choose it because you intend to use the bevy of other AWS hosted services. The clusterfuck of management is directly related to that.
The alternative, which I feel is far too common (and I say this as someone who directly benefits from it): You choose AWS because it's a "Safe" choice and your incubator gets you a bunch of free credits for a year or two. You pay nothing for compute for the first year, but instead pay a devops guy a bunch to do all the setup - In the end it's about a wash because you have to pay a devops guy to handle your CI and deploy anyway, you're just paying a little more in the latter.
What's your issue with EKS? I operate several very simple and small single-tenant clusters, and I have to touch the infrastructure only once a year for updates
You can also simplify Kubernetes to just Kamal and things become instantly easier...
I personally love terraform. It's easy to use and actually it's rigid framework allow to make less mistakes/way more readable than pulumi
Anyone use Garnix? https://garnix.io/
This looks too experimental for me to trust with production deployments.
Is this an Ad?
GitHub has been littered with developer relations growth hacks recently.
I strongly recommend sst.dev
[dead]
Digital Ocean isn't really a "real" cloud. Maybe use Digital Ocean if you're hosting video game servers, but no serious business should be on it.
I wouldn't even use DO for that, unless it's like a private server for just your friends.
I won't touch DO after they took my droplet offline for 3 hours because I got DDoS'd by someone that was upset that I banned them from an IRC channel for spamming N-bombs and other racial slurs.
When was this? Now DO and Linode promise full DDOS protection.
What's your definition of real cloud?
And can you name a real cloud that charges a half-reasonable price for bandwidth? I consider $10/TB to be half-reasonable.
Ideally one that doesn't have these kinds of issues:
https://news.ycombinator.com/item?id=6983097
That was more than ten years ago, I don't think that tells us about current quality.
While yes, it was more than ten years ago, we can see that such stupidity is woven into their DNA as a company.
TL;DR: where a cloud provider hosts customers for which there are real-world consequences for data leakage, not a single customer can be at-risk for data leakage. It's a different line of thinking, almost "a different world", to those who have this line of thinking vs those who do.
"The thing about reputations is you only have one".
By contrast even more than ten years before that, AWS was publishing whitepapers about how all contents of RAM to be used by a VM are initialized before a VM is provisioned, and other efforts to proactively scrub customer data.
I worked at a niche cloud provider a bit over ten years ago. We used Intel QAT for client-side encryption for our network attached pools of SSD. We were able to offer all-SSD at low cost and without security blindspots by crypto key rotation implemented by compartmentalized teams and also physical infrastructure compartmentalization patterns. Which, about half a decade later we found we were second only to AWS and almost second (but ahead of in other ways) to some smaller cloud-style hosting provider.
> While yes, it was more than ten years ago, we can see that such stupidity is woven into their DNA as a company.
I don't know if it really meets that bar, but I won't argue about that right now. I'm just going to ask again for your definition of "real cloud" and whether you can suggest some that don't price gouge bandwidth (and aren't oracle, I would not consider them worthy of trust either).