Episode 13: ECS – A Tale of Dos Equis

As we settle into with a nice tall glass of Amazonian Kool-Aid, we're going to talk about ECS - the Amazon Elastic Container Service. How can you run containers without a lot of fuss? ECS! We'll also touch on some EKS, mostly because Brian keeps bringing it up and spoiling episode 14.

As we settle into with a nice tall glass of Amazonian Kool-Aid, we’re going to talk about ECS – the Amazon Elastic Container Service. How can you run containers without a lot of fuss? ECS! We’ll also touch on some EKS, mostly because Brian keeps bringing it up and spoiling episode 14.

Brian Seguin
Hello, you’re listening to rent by build. I’m Brian Seguin.
James Hunt
And I’m James Hunt.
Brian Seguin
Today we’re talking about ECS, or otherwise known as Amazon Elastic Container Service. Don’t ask me why they just cut it off to be, you know, the ECS portion and they don’t do A-ECS. That also might be the the version of the Canadian one.
James Hunt
I think that’s ECS-eh.
Brian Seguin
Oh, sorry. I had it reversed. So I think a lot of people are trying to, from a management standpoint, a lot of people are actually trying to figure out okay, what is ECS? And why do I also hear the words are the word EKS out there, Fargate, Lambda, like all those different things that you can use to deploy your applications and run applications inside of Amazon? What does all this mean? I’ll start with some of my understanding is that ECS. So that’s the Amazon Elastic Container Service is basically just a fancy Docker running thingy inside of Amazon and EKS is actually Elastic Kubernetes Service. And that’s really just Amazon’s Kubernetes platform. So it gives you a little bit more orchestration tooling and things to do stuff. But I think we’ll get into a little bit more of what ECS does, and that for this episode, right?
James Hunt
That’s the plan. So yeah, we’re gonna talk about ECS. And we’re gonna have to talk about EKS. But we’re only really gonna get into EKS insofar as to differentiate it from ECS. And it’s not just one letter off, it’s actually a whole system off. They’re, they both do containers. But they do them in very different ways. And a lot of that is due to history. So to refresh the timeline, in the listeners head, Docker was launched in spring of 2013. That was when we started seeing actual containers, the way we see containers today, a Docker file or a manifest of: put all this stuff in this image, move the image with the container orchestration bits, so that I bring my file system along, and the instructions and all of my code and everything and it’s one big, nice package that can execute. As soon as Docker became viable, we started seeing a whole bunch of different ways to orchestrate containers, because at its core, just Docker is a lot like just having a process, you can spin up a container, it’ll do what it needs to do. When it exits the main process the init process PID one, the container shuts down. That’s great in that it is a fundamental building block of a larger strategy. But most people need their applications to keep running an application sometimes die or shut off for reasons outside of our control. servers, reboot. bugs are tickled by input or timeouts or other things, they crash, they panic, they otherwise terminate when we don’t expect them to. And from an availability perspective, that’s not great. We would like our containers to come back. So early Docker didn’t really have that option, or that that idea. So a lot of orchestrators and a lot of systems were built around that ECS is one of those right? So ECS launches in 2015. It actually the ECS service, the GA service, and Amazon coincides with the announcement slash launch of the Kubernetes open source project. So that same time, so Amazon has been working on a containerization platform for a while, as of summer 2015. They’ve been working on it, they’re doing Docker, they’ve got all this, this, these additional higher level concepts built around the primitive of the container. And then as soon as they GA, the Kubernetes announcement happens from Google, they’re like, Hey, we have essentially Borg, we’re going to make it available to the world. It’s going to be a way of running containers, you’ve got Google’s thing, you’ve got Amazon’s thing. eks is Kubernetes on top of Amazon where you don’t have to manage Kubernetes it’s, it’s what we call a managed Kubernetes solution or, or an engine — it’s very similar to GKE.
Brian Seguin
So what does that mean? You don’t have to manage it? What is there to manage in that ecosystem? Because my understanding when you’re deploying your application with ECS, you just push your container in it, you know, you put the endpoints that you need it to point to and it just kind of runs right. And then you can go through and you can add more more containers alongside it if you need to scale that sort of thing. Is that correct?
James Hunt
That is yes and no. But also Yes. So let’s let’s rewind back to the early days of ECS. And we’re gonna ignore Kubernetes. And just kind of take it, take it, just trust me on this Eks is bigger than ECS. From an investment standpoint, like from an energy and effort and a number of concepts to keep in your head.
Brian Seguin
Energy effort, footprint consumption, all of those things?
James Hunt
it’s just bigger, it’s more complicated, and it’s more powerful.
Brian Seguin
Okay.
James Hunt
But it’s also bigger. And the complexity, maybe is one of the reasons I think that ECS still exists today. Also, Amazon never deprecates or decommissioned anything unlike Google. So back in 2015 days, what Amazon solved was, how do I keep my containers running? And how do I build an API; because you remember Amazon’s always been driven by API’s. That’s one of the reasons that the company has scaled so large internally is because everything’s an API, even internal HR stuff is an API, all the things inside of Amazon are API’s. But they wanted an API for the public to consume that would allow them to start containers, stop containers, and scale the footprint of containers. And, and the value there is so that you can survive the the bloomberg.com effect of the Slashdot effect or whatever you want to call it, where you haven’t an expected spike in traffic, and your web presence or your application needs to be able to scale, more or less on a dime to meet that need without falling over. Right. So ECS is based on the original version of ECS is based on what’s called the EC to launch model, because containers need to run somewhere, and Amazon already has a whole other service for doing VMs. Right, that’s what EC2 — Elastic Cloud Compute — will spin up a VM in a region with a given size, a certain amount of disk, certain number of CPUs, and we’ll bill you hourly for it. So when you tell Amazon, I would like to console to set up and consume ECS, I would like to set up a container, I would like three copies of this container, and I would like them to all be balanced behind an ELB it has to put them somewhere. And that’s where we get into the difference between ECS on EC2 and ECS on Fargate.
Brian Seguin
So wait, this, this is where ECS splits into two different camps.
James Hunt
Um, I don’t know if it’s; I don’t know if I’d classify it as two different camps. it’s more about–
Brian Seguin
is it two different runtimes?
James Hunt
It is two different runtimes. Although I very strongly suspect that fargate is also running on top of EC2. So at the end of the day, you’re still gonna end up on a VM inside of Amazon’s data center somewhere. But the main difference is in pricing. And in cost management. And the reason we have both of these is because when you say when you when you go to EC2 launch model, that that side of ECS your application is modeled as a set of services and the services have tasks. So you’ll say I’ve got what will take birthday knock as an example. I did get that spinning, it is out on Twitter @birthdaynoc on twitter.com.
Brian Seguin
And for those that don’t know what @birthdaynoc is, our last episode — Episode 12 I think — covers the birthday NOC.
James Hunt
Sure, I forgot what number it was. But yeah, last epiosde, I talked about all the stuff we’re running. And I do almost have the video set up for actually going through how that’s put together. But there’s really three parts to it. There’s the writer, there’s the Redis key store, and there’s the front-end Social Media Manager. Inside of ECS, the Social Media Manager and the writer processes would be each would be a service, we’d have a writer service that is responsible for building out the content, you know, actually forming the words into sentences into tweets. And then we would have another service in ECS lingo for taking the tweets off of the Redis key-value store and putting them into the Twitter API. And those two are separate services (a) because they’re separate container images. One has Perl, the other I believe is written in Go. They scale differently as well. So the writer scale needs to balloon up at the beginning of the process to when we have more stuff to write. But the the Twitter bot side; the Social Media Manager side; only has to scale if we’re going to change the frequency with which we post to Twitter. So right now, we’re dropping a single tweet an hour because that’s roughly the minimum number of of birthday greetings we have without repeating ourselves. But if we were to add in — and as we add in more features to this — we will be, we will need to handle things like inbound mentions, or direct messages or a website where people can go look at stats, those would all be different services. Now those services have scale. So we would say how many container instances or tasks in ECS lingo Do we want this service to be at. And that’s important because as we launch new versions of the container definition, we upgrade a version of the, the image, for example, new code goes out, or we bump the resource allotment to the content, we need more memory allocated to this thing, or we need a new bind mount or a new volume or new something. ECS will rollover deploy all those things, if you have a three tasks service, does that have an automatically or does and that’s one of the cool things about ECS. This is 2015. Right? Most most instances of blue-green, are highly specific to a pipeline tool or a workflow automation engine, if it’s even being thought of at all. So yeah, the the ECS services, they decompose down into tasks, the tasks are the things that Amazon is managing Amazon’s the one who’s going to say, Oh, this service is supposed to have three tasks underneath it; three container instances, I only see two, I’m gonna go ahead and spin a third one up just to make sure we’re, you know, we’re on par with the spec we were asked to implement. That’s the value.
Brian Seguin
And Amazon really only does this, if it really needs to do it, can you actually set the parameters and have it have it prevented from doing you know, the spillover if you need to, or well, so
James Hunt
The scale-up is something you control via the API, but the run this many copies of the container no matter what — that’s built into ECS. So when you tell ECS, “I want three containers,” it will make sure you have three containers. It won’t make four, unless you’re doing a rollover deployment or a rollout, right. At which point it will build a fourth one, put it into the mix, take one down it’s it’s actually the rolling update strategy from Kubernetes. You see it nice? Yes.
Brian Seguin
So does it like auto scaled down after the workload is finished
James Hunt
ECS doesn’t. ECS natively provides just the building blocks for you to have a monitoring system that’s looking at load or memory utilization or latency to servicing a request and going, Hey, the cluster looks overloaded. Tell ECS to double the number of tasks.
Brian Seguin
Got it. So if I’m managing a team, that is deploying our applications to ECS, and we have our busy times of the month, where we get a bunch of inbound requests, and we scale up, I are my technical team has to go in and actually actively scale those down to expand and contract via the the actual application demand. Is that how it works? Right?
James Hunt
That is that is the basic idea. They have added services around this to make this a little easier. So you can do batch processing now and ECS, using tasks scheduling.
Brian Seguin
Because I think EKS in general, is; sorry, switching over to EKS. Specifically, just this was a question for differentiation. eks has a autoscaler feature, is that correct?
James Hunt
It does.
Brian Seguin
And that will automatically scaled down based off of workloads
James Hunt
That will scale up containers, unless you’re talking if you’re talking about the the horizontal pod autoscaler. The HP cy bat is one that will scale out additional container instances. There’s also a node-scaling option. I’m actually not sure I haven’t used that any KS, but I’ve seen it another Kubernetes the value of EKS over ECS is that EKS is Kubernetes. And as such, it has portable APIs for dealing with these these higher level constructs, right. So when you tell EKS, I need a deployment of all of these pods, I need six copies or six replicas. And I need this persistent volume claim. And I need this Ingress rule and I need all this networking stuff. What you’re saying what you’re speaking, the language you’re speaking is Kubernetes. You could take that exact same config, and send it over to a GKE cluster on Google’s infrastructure. And it should work the same way.
Brian Seguin
You get the cloud portability, you can have the same experience on prem and with other cloud providers. It’s the same type of configuration because it’s all Kubernetes based. I think the added benefit using the Amazon’s Eks is you get a lot of the different Amazon services that is pretty much everything under the sun that you could actually think of Amazon offers.
James Hunt
Right. You cannot take your ECS integrate Anywhere else there is no ECS on Google.
Brian Seguin
So, just for clarification, you know, EKS, if you’re deploying EKS, you have additional overhead for doing some operational things, some maintenance things. And you know, some more attention is needed for the actual runtime and deployment. Whereas ECS has a lot less runtime, and deployment configuration, things that you need to worry about. But it’s a lot easier to actually, if you want to–
James Hunt
I wouuld take one small issue,
Brian Seguin
okay,
James Hunt
it’s not about maintenance of the cluster, because Eks is a managed cluster. So inside of Kubernetes, you have an API server, you have an etcd data cluster, and you have one or more nodes, or kubelets, that are running that comprise the whole of the cluster. So those three things are something that has to be managed by somebody, if, if you’re talking Kubernetes, that you’re running on VMware, your own vSphere ESX. Back in the data center, you’re going to be managing kubelet, you’re going to be managing API server, and you’re going to make sure that etcd’s raft algorithm doesn’t go off in the weeds and eat your data. With EKS. Amazon’s doing that, right? It’s their responsibility to backup etcd, it’s their responsibility to upgrade API server. And this is the same story for everybody who’s doing a managed Kubernetes. All you have to the only responsibility you have, as a cluster owner in these managed Kubernetes is to use the cluster using normal Kubernetes configuration. The difference is that Kubernetes configurations are way more complicated than ECS. Because ECS is domain model of how you can model your problem is greatly simplified.
Brian Seguin
So it’s more of a learning curve for the application developers, if you’re going from ECS to Eks, if your developers are used to using ECS. And they have to use go to Eks, they have to learn how to do the configurations in the appropriate Kubernetes manner, which actually could theoretically be a heavily heavy investment depending on how many developers you have working for your organization, their time schedule and how much corporate overhead that you push down on them for touchy feely meetings
James Hunt
And retraining budgets not withstanding, there’s also the things that slip through the cracks. When you’re not a subject matter expert on how to configure Kubernetes stuff. There’s a whole bunch of ways, you know, adding–making privileged containers. Yeah, I mean, ideally, you have, there’s a whole bunch of stuff you can do completely by accident just by not being an expert.
Brian Seguin
Ideally, if you’re if you’re consuming EKS, as a, as a customer, you’re have some form of DevOps center of excellence that is working with your developer teams that are deploying your applications that can, you know, make sure that they’re adhering to the best practices, make sure that there’s guardrails in place. So they’re, they’re doing all the appropriate things. Whereas ECS is more like, you don’t really need that much of an ops team.
James Hunt
Not as much I mean, I wouldn’t, I wouldn’t rule out an ops team; an ops team is always helpful. But there’s there’s a lot fewer moving parts in an ECS deployment or an ECS configuration. So before we run out of time, I do want to talk about the difference between ECS EC2launch and ECS Fargate launch because a lot of times I hear people say oh, we don’t want to use ECS, we want to use Fargate. Like okay, but you do know you can use both of those together. Incidentally, I believe you can also use EKS with Fargate. But I’m not going to get into that this episode;
Brian Seguin
Is this the tale of Dos Equis?
James Hunt
A Tale of Dos Equis; yes, there are two ECS, there’s ECS, that’s backed by VMs that you keep. So in that instance, you tell ECS: “Not only do I want this service, and this many containers, but I also want to run it on these instances.” And when you build out those instances via the ECS API, behind them are EC2 VMs they come with additional software, an agent daemon for communicating assumably with the ECS control plane for marching instructions and what images to pull and when to do it and how to what version of Docker we’re using or whatever container runtime it is. But so you’re not managing the EC2 instances yourself but you are paying for them and they are being managed on your behalf. So the the benefit of that is you know, that you’re never going to exceed the usage of those EC2 instances, at least, you’re never going to exceed the billing for the usage of those EC2 instances.
Brian Seguin
And then you could also have reserved instances where, right you know, your demand in advance and Yep, yep.
James Hunt
So you have all the tools that you have in the VM side. Now you can run in the container side just by virtue of the containers run on top of VMs. The downside of that is you’re going to pay for those EC2 instances. Even if If you’re not using them to the fullest extent, in fact, on average, you’re going to be overpaying in the easy to launch. But it’s a consistent, regular monthly overpayment. Right? On the opposite side, you’ve got fargate. Now fargate is I wouldn’t consider it quite a serverless. serverless is weird people. Anytime you don’t have to manage the server, people say, Oh, it’s serverless. And yes, there are no servers that you’re aware of in fargate. But there are obviously servers to run these containers on. With fargate, you pay essentially as if EC2 instances were popping into existence as you needed them. Right. So let me walk you through a real, real story. With ECS in both scenarios, one, we spin up an EC2 instance. And that instance has four gigs of RAM and two cores, they’re really small instance, then we tell ECS, we want to schedule 100 tasks, we are now trying to cram 100 containers into four gigs of RAM and two cores, that might work, you might have very lean containers that can do that. That would be very small containers, you might be talking you know, rust services that are, you know, highly optimized that are, don’t have a big memory footprint, they do a minimal amount of work, there’s just a lot of them. So you’re you’re performing low touch work and you want low latency, high throughput, so you have a ton of them. If If you in the EC2 launch style, you’re only ever going to pay for that four-gig two-core instance, if you double the size, the number of tasks, if you go from 100 containers to 200 containers, there’s a good chance you’re not going to get to 200, you’re not going to be able to keep spinning up tasks, ad infinitum. Until you know, because you’re going to hit a ceiling at some point, you’re going to hit the ceiling of how much you can actually fit physically on that virtual machine. At that point, your ECS dashboard is not going to be happy, because it’s going to have a lot of stuff that it cannot start. But it wants to because you told it to write like the puppy, where you thought you threw the tennis ball, but you’re still holding it, but it’s still looking. And by God, it’s gonna find it and it keeps looking at you. And looking back, don’t be mean to ECS is what animal cruelty. The fargate story is completely different. If you go from 100 containers to 200 containers, guess what? It works, your build doubles. But it works. If you then drop that down to four containers, because you’re in a lull or overnight or whatever. It’s that also works but you’re not paying for the extra, you’re only going to pay for the four instances for however many instance hours are sorry for tasks forever many task hours they executed.
Brian Seguin
So it in another way of trying to say this in the ECS. Example, in here, you’re talking about doubling the amount of containers inside of what you already have for the VMs provision for for what you’re talking with the workload. Yeah. So another way to scale that is that you would have to actually go and provision more pieces alongside you’d have to like duplicate, right, whatever you’re doing. And so I’m trying to cram it into what you already have, right. And
James Hunt
in reality, what would happen in that case is when you notice that the tasks weren’t starting up, because of insufficient resources, you’d go into the dashboard, or your API or whatever tool you’re using and bump up, add another instance, add a third instance, whatever, to make the cluster scale. But that doesn’t happen for you automatically. Which means as a human, who hopefully has your budgets in mind, a human is the one making that decision. Whereas on the fargate side, the robots, the machines are making that decision because they’re programmed to, which means if you’ve missed type an instance count or task size count, you might well end up with a very large AWS bill that you weren’t expecting.
Brian Seguin
When, from the ECS standpoint, you’re going to be over provisioning all of these instances, because you don’t want to run out of the, you know, the footprint that you actually need to run your workloads. You don’t want to have to be constantly adjusting and up and down. So I think the developers will just have a natural over provision there. Whereas if you’re architecting your applications a little bit differently to operate with fargate. For those workloads that are flexible, it’s going to be easier for them to scale up and scale down. Is that pretty much right? Yep. Interesting. Yeah. So it, it’s fascinating to me, so there’s, you know, you have your EKS, right, which is your Kubernetes offering on top of Amazon. Then you have your ECS which we just spent a lot of talking about, but it’s also interesting that ECS is kind of really divided into two main deployment models are runtime, not runtime models, but different models for how things operate inside of it.
James Hunt
Right and you can flip right, you can flip between one or the other, if you find that, you know, we’re overpaying for the EC2 launch model, you can move over to fargate. If you’re afraid that, you know you had a scaling event, and something, you know, you gotta you’re afraid about the bill or whatever you can go Fargate to EC2 launch, I don’t see much of that, to be honest. Most people are either they start on EC2 out of conservatism on the budgeting side, and then they moved to fargate. Because they’ll actually save more money doing that than they will
Brian Seguin
Well, and ideally you’re going through and you’re identifying the application workloads that you can actually pull out and re architect so that the the ones that need need that expanded contract are utilizing fargate. And the ones that are going to have that consistent runtime, that consistent runtime footprint are going to be using just the normal ECS
James Hunt
it only makes sense to do the EC2 launch if it’s cheaper for you than it is to run the same number of workloads on fargate. Or if you really are afraid of overstepping your budget, and there’s ways to manage that as well. Amazon’s not going to just scale up your service without you knowing.
Brian Seguin
Well, I think from a business standpoint, if you have massively scalable loads, you know, it’s better to have those compartmentalize just from a business standpoint. So you know, which, which ones you need to be worried about. And if they’re not hitting the right business requirements that are making you the right money, versus the cost that they’re incurring, then you should really be considering you know what, what else you have to do to make it work.
James Hunt
I do want to point out the we mentioned that ECS has the ability to run on EC2 instances or on fargate EKS does as well. So you really have a full matrix, you can either do ECS style container spesification on top of VMs, you’re on the hook for or in the cloud. And you can have Eks scheduled pods on fargate. So we’ll talk about that in the next episode. I think we’re getting to EKS
Brian Seguin
Awesome.
James Hunt
And I’m hoping actually maybe not the next episode of the episode after because I’m hoping in the next episode, we’ll have some more information to talk about on ECS as we build the birthday NOC into an ECS service, and actually kind of talk through some of the more the finer points of that implementation story.
Brian Seguin
Can’t wait. Thank you for listening to Rent, Buy, Build. I’m Brian Seguin.
James Hunt
And I’m James Hunt.