Episode 4 – Source Code/Control Management

Welcome to episode 4 of the Rent, Buy, Build Cloud Native podcast! Today we're going to dive right into VCS/SCM, debate what the middle C stands for, and talk about git, git, git!

Welcome to episode 4 of the Rent, Buy, Build Cloud Native podcast!  Today we’re going to dive right into VCS/SCM, debate what the middle C stands for, and talk about git, git, git!

James Hunt
Welcome to Rent / Buy / Build. I’m James Hunt.
Brian Seguin
And I’m Brian Seguin.
James Hunt
Each week we discuss the pieces and parts of Cloud Native platforms and answer the question “should you rent this, buy this, or build it yourself?” In this episode, we’re talking about source code control management, or as you probably better know it as … Git.
Brian Seguin
James, when we were doing research for this podcast, the first question I had to Google is, is there other source control options besides Git?
James Hunt
And what did Google say?
Brian Seguin
I was actually surprised. There was many articles saying “Alternatives to Git,” and there’s, there’s about there seem to be about nine of them.
James Hunt
Sure.
Brian Seguin
But, you know, I was kind of having a difficult time finding an actual “build” scenario. Is there anyone out there building source control from scratch?
James Hunt
So to answer that question we have to talk through — I’d like to talk through a little bit of the history of how we got to where we are today with git,
Brian Seguin
How did we get to where we are?
James Hunt
How did we get here? Record Scratch. Ferris Bueller. You’re probably wondering how we ended up with Git.
James Hunt
So it all starts way back, as most things do in this space. with Unix, and a system called RCS or “revision control system”. RCS was interesting, because RCS lets you keep backup copies of individual files. So if you had a file that you were working on, that was, say, a report that you were writing on, I don’t know you’re on Unix, so it’s probably something related to AT&T and Bell Labs, and you’re writing this report and you want to save a copy, you could just save a copy on the file system. And then, you know, report2, report3, right? The the old school source code control. With RCS, you could have the same file name, and it would remember what the file looked like at certain checkpoints — you would check in or check out that file. And that verbiage actually survives today in Git: when you “check in” or “commit” your code. That’s when you say “this is the version I want you to keep track of.”
James Hunt
Now, RCS was finicky and annoying because it worked on a single-file basis. So not long after RCS, the venerable Concurrent Versioning System was built — CVS — and for the better part of the 90’s. CVS was the version control. It could run on a central server, you would check out all the code from the central server and multiple developers across the Internet could work together. And a lot of the software we use as like the bedrock of Linux- and Unix-style things today was originally developed in CVS,. Most of the GNU project was built on CVS.
James Hunt
But CVS has a lot of structural problems. So in the late 90’s, the subversion project comes around and subversion was, “let’s do CVS but better” and with a bigger focus on branching semantics, to allow people to try ideas out in what’s called a “branch” off of the main “trunk” of code, so if you consider the development is not linear, right? Teams go off and try things; if it works, they merge patches back in. So it’s kind of this branching tree-like structure. Subversion makes the branching easier, but it makes the network footprint a lot heavier.
James Hunt
So when the Linux project — the Linux kernel project — is looking for a better source code control system, Linus Torvalds, goes out and finds thing called BitKeeper, which is a proprietary, licensed bit of software that does blob-based backup or a blob-based commit and version control. The problem with bitkeeper was it was proprietary, and it was being used for an Open Source thing. So what Torvalds did is he built his own, and in classic Linus form — because he says he names all of the software after himself having written Linux — he named it Git because he was a “stupid git” for having tried to write that himself.
James Hunt
In the intervening time, we’ll Git was’s coming of age, a whole bunch of other research projects graduated from the realm of academia into mainstream, and I’ll use air quotes with the “mainstream usage” projects like Bazaar and DARCS and Mercurial that still have a fair amount of adherents today and a fair amount of people still using them. But I think it’s, it’s no, it’s not a controversial opinion that Git is the the source code control system — no one’s using CVS, or even Subversion on on modern newer projects. GitHub has made Git accessible to anyone for free. And Git itself has gotten more user-friendly over the years.
James Hunt
Yes, there are other projects out there that you can use instead of Git for your source code management, not even to scratch the surface of what Microsoft does with Visual SourceSafe and all the things that integrate with the windows ecosystem. But as far as people setting out to build new revision control. I don’t think that’s a thing, unless you’re a developer and you don’t like what other developers have built, so you’re going to go build your own thing. But for companies or organizations, there is almost no value in building your own RCS.
Brian Seguin
Interesting. So in this episode, we’re really going to be talking about rent or buy.
James Hunt
Yep.
Brian Seguin
So if we talk about rent, I mean, everybody knows GitHub, right? What are what are the other options out there for a rent scenario?
James Hunt
Right. So for rental, you’re talking SaaS — somebody else is managing the storage for your repos, someone’s managing the access to your repos. github.com obviously, as you mentioned, is the flagship; the the standard bearer in this space, but bitbucket.org, which is an Atlassian service that runs their BitBucket software, but makes it available on a free plan and some paid plans that’s out there. GitLab, which is similar to GitHub, in terms of the feature set and what they’re trying to do with the product. They have a SaaS offering. Those are kind of the big three that I would say I’ve seen used in a variety of enterprises and organizations.
Brian Seguin
Okay, so then I guess the major disadvantage to a SaaS based solution is really, that someone else’s managing your code,
James Hunt
Really?! You’re gonna go straight to disadvantages and not talk about all the advantages?
Brian Seguin
Well, I mean, sure–
James Hunt
That’s a little pessimistic. To me, the biggest advantage of SaaS based SCM is really the feature set., SCM, if you just say, “well, it’s Git,” right? That’s a closed feature set. It’s keeping track of history, letting you revert back to previous not-broken versions of your code base, and the ability to push that code elsewhere and let other people work on it. That’s, that’s SCM features in a nutshell.
James Hunt
But these SaaS solutions have realized that, (a) nobody’s going to pay for that and (b) running a free service that doesn’t make any revenue never really works. So what the SaaS offerings do is they build complimentary features around the source code management solution. GitHub continues to astound me, with all the things they’ve managed to add into the get experience. It started years and years ago with the concept of a pull request. Pull requests, by the way, are not really a thing in Git proper. That is a GitHub concept that they popularized and is now ubiquitous. There’s no SCM SaaS offering out there that doesn’t let you do PRs and assign them and do that. But pull requests, facilitated forking of repos. The code review engine inside of GitHub is the main reason I use GitHub, because it lets me manage code in one place and give feedback and mentor people and generally make the products that I’m working on better through collaboration. And then the most recent addition to the GitHub.com feature set is of course, GitHub actions, which we talked about a little bit in our last episode on ci CD, but saying, look, you already have the code here, and you’re already doing the PRs, the process of updating that code, collaboratively, why not have those events then trigger CI? Have those events trigger unit tests to inform your pull requests, so you can merge code, knowing that it works? Those to me are the biggest advantages of SaaS-based source code control management.
Brian Seguin
Interesting, I still want to go right to disadvantage.
James Hunt
Okay, we’ll do–
Brian Seguin
Because–
James Hunt
hey, I’ll tell youwhat Brian, we’ll do one disadvantage. And then we’ll go back to advantages because I have one more that I think is very important to business process minded folks.
Brian Seguin
But in all seriousness, the there’s a massive misconception out there about GitHub being on the web, available to everybody, it means it must mean that it’s less secure if you’re not putting it on your own servers, right?
James Hunt
To an extent. The security of GitHub, there’s a couple of different threat vectors you have to consider.
James Hunt
There’s the accidental publication threat vector, right? I accidentally pushed the secret sauce code to a public repo on GitHub, and now everybody in the world can see it. And once something is seen on the Internet, it never goes away – the Streisand effect. So but but GitHub answered that and all of the offerings answered that with private repositories, where you can say, “look, this repo, this set of code is only available to the people I say can see it.”
James Hunt
So that’s the first order of security as if you’re running proprietary or intellectual property stuff is in your code, which you know, if its core competency code, if its core business logic is probably under IP. You can make it private, right? We we do a ton of these for either for customers or for our own stuff. And it works pretty well because the the threat vector of random person on the Internet seeing it is closed off.
James Hunt
The other couple of threat vectors are a breach of GitHub. If someone carries out a SolarWinds-level attack against the GitHub infrastructure, then yes, your code will leak even if it’s private, because on the back end, those, that distinction doesn’t matter anymore, right? If you can go in and hit directly to the file servers that contain the Git repos, doesn’t matter what the API on the front end is saying you can or can’t do. I think, though, that that’s a bit of an overblown threat vector. When people talk about “well, we have to run our source code management on prem. because, you know, GitHub might get hacked,” like, yeah, so GitHub is run by Microsoft. If Microsoft gets hacked, I feel like you’re dealing with a threat actor who is far above and beyond what you can individually protect against, right? And this is like state actors, right? If China wants to come and take your code, they’re gonna do it, regardless of if it’s in Microsoft’s data centers, or in yours.
Brian Seguin
So the real threat here is not trusting your or accidents from your developer team that are submitting credentials or public keys to a public repo by accident.
James Hunt
Yep, I think the accidental disclosure is your biggest, most egregious threat avenue. And it does get closed off by making things private, and it does get closed off by running things on prem.
Brian Seguin
Okay, so let’s talk about the other advantage that you wanted to bring up.
Brian Seguin
So I’ll be the optimist.
Brian Seguin
(uncontrolled chuckling)
James Hunt
Thanks for letting me be the optimist this time, Brian. I want to talk about skill set portability. And for that, what I’m talking about is, if you’re if you’re running a large enough development organization, and I’m not talking to single team, but teams of teams, you’re going to be hiring people, as they as you grow, or as people attrition out, they go on to bigger and better things, and you hire to replace and re-fill your headcount. You’re going to be bringing people in who have experience with the public SaaS offerings I have not interviewed a developer who hasn’t used GitHub. It’s just it’s so easy, and it’s free. And it’s it’s, it’s cheap, easy and free. So everybody does it. So everybody’s at least passingly familiar with pull requests and GitHub issues and what a repo is, and, and how GitLab does things compared to how GitHub does things. And because the SaaS offerings are trying to appeal to a large, broad market, they’re going to kind of — they have actually, coalesced on a core set of features that are all basically the same.
James Hunt
If you use GitHub and you’re given Bitbucket, you can figure it out. And it makes a lot of sense and the concepts match. And that’s important when you’re hiring developers, because you don’t want them to get hung up on trying to figure out all your tooling before they can get on to doing the productive business logic work that you actually are paying them to do.
Brian Seguin
It’s a common language. Everybody speaks Git.
James Hunt
Yep.
Brian Seguin
I think going back to the other potential disadvantages, I mean, what is the licensing model for this? Is this a license? Is it consumption based? Is it repo based? Is it size based? What are we talking about?
James Hunt
It’s almost invariably “repository count” based, and whatever other resource makes sense for the other offering — the other feature set. So the pay-as-you-go model for rentals, usually, these companies are very supportive of Open Source software. So if you have Open Source projects that you’re you have built in open sourced — which is a whole other episode we should talk about — you get a lot of this stuff for free, because you’re osteensibly doing this work in public and to the benefit of the entire Internet ecosystem.
Brian Seguin
So if you’re so if you’re doing Open Source, it just makes complete sense to throw it on Git, or one of the other SaaS providers and just call it today.
James Hunt
I mean, from a cost perspective, yeah, because it’s free and someone else is managing it, you don’t have to pay for it. I have used GitHub for years. And it wasn’t until I needed private repos for some personal projects that I went ahead and paid for an account. And that’s where the pay-as-you go model enters into it. If you’re trying to keep stuff out of the public, GitHub is going to charge you. They used to charge you per-repo, right. They used to charge you if you had I think you got like up to 10 private repos. And then they would, you would plateau up to the next plan. And then you get 50. A while back probably three or four years ago, they changed it to unlimited private repos, and then they charged you per user. So as you add staff, your costs will go up, but you can have 100,000 private repos. And it doesn’t cost you any more than having one private repo
Brian Seguin
That makes sense for a lot of these enterprises that have 10,000 private repos and 2000 developers, right? Are there any other disadvantages to it to a SaaS solution?
James Hunt
Like I said, besides the private repos often costing money, the bundled features usually have limits. We talked about GitHub actions in our last episode with CI/CD, the runners for private repos are limited in the number of minutes that they can, that they can run, and then you have to pay more. GitHub has a thing called LFS, Large File Store, because you’re not supposed to put really big files in git, it’s kind of counterintuitive, are counter to the the model of how Git does track those changes internally. So LFS is if you want to put large images or PDFs or ISO files, LFS is a service that will host the files and then the Git repo has a link to the LFS store so that when you clone the repo down, you can pull those out. We’ve used that actually on a couple of private repos for Bosh releases, were the packages, we didn’t want to put them in S3, so we put them in LFS, and that costs us monthly to run. Right? So depending on how much size but really, it’s the cost is based on usage or staffing. And that’s really the only disadvantage, aside from the threat model of “what if somebody steals my public code?”
Brian Seguin
Or you accidentally push —
James Hunt
It’s interesting, because pushing public or pushing private keys to GitHub is so passe. everybody’s done it? I don’t know, anyone who’s been who’s done enough. Like I think once you hit a couple 100 commits, you’ve probably pushed a credit.
Brian Seguin
And that’s why you rotate your certs constantly, right?
James Hunt
Well, that and like there are there are services out there that will scan GitHub, looking for these things, and then let you know, kindly “Hey, by the way,” like Amazon — AWS — will actually revoke keys if they find them pushed, because it’s easier for them to do that.
Brian Seguin
That’s awesome.
James Hunt
I mean it is until your keys stop working.
Brian Seguin
So I think for a buy–
James Hunt
Then you get in Slack, and you say, “hey, who pushed the keys to GitHub again?” I’m not saying I was responsible…
Brian Seguin
And hopefully, it’s hopefully it’s not frequent. But it’s nice that there’s some safeguards in place out there from some of those public IaaS providers.
Brian Seguin
So from a buy scenario, I mean, this is really, if you’re, if you’re, again, if you’re operating on prem, if you have limited network space, but even in that scenario, you can still use one of these SaaS solutions, you’re just pulling your code down, and then redeploying it inside of whatever environment you’re working in.
James Hunt
There’s there’s really three flavors of “buy.” Okay, there’s the on prem SAS, right, so Atlassian, Bitbucket, you can run a Bitbucket server on your own infrastructure, that’s technically a “buy,” because you’re licensing Bitbucket, I think it’s licensed the same way he’s licensed per user, I’d have to double check. GitHub enterprise, you can run all of the GitHub APIs and everything. Some things aren’t included in Enterprise but they’re rotating in. You can run that on-prem on a VM or on physical box.
James Hunt
So that’s the the SaaS, but in your own data center. Then there’s stuff that doesn’t have a SaaS component, like GOGs, which is an embedded Git server like GitLab. If GitLab didn’t have a SaaS, they’d be in the same boat as GOGs.
James Hunt
And then there’s the I’m an absolute miser, and I don’t want to pay for anything, you can just install a Linux box, a stock Ubuntu or Debian or Red Hat, put SSH on it and run Git over SSH. You will not get any of the other features, you won’t get pull requests, you won’t get code review or CI/CD. But for years I ran — and I actually still do, I just don’t use it as much — I ran a private Git-over-SSH server in Linode. I think it was SliceHost at the time. And it worked great for me. Now, if I as I scaled to having more people commit, it was just easier to move to GitHub, because it was free. And all this stuff was public anyway. But those are your three options.
Brian Seguin
So I guess from an advantage standpoint, do we do we get rid of that security issue? Do we get rid of the publishing keys to the public cloud type thing — is that answered?
James Hunt
Assuming of course that you’re not also exposing your on prem to the public Internet. It goes without saying just going to an on prem “buy” isn’t going to make you secure, more secure by default, but it does take care of the the public visibility because it’s all behind firewalls and NAT devices. And you still get the the skillset portability if you’re in that first flavor that that SaaS-but-on-prem GitHub enterprise looks and feels just like GitHub.com, Atlassian Bitbucket looks just like bitbucket.org, because it’s the same software. So it’s really just a matter of you have a different URL to go to to get to the PRs. If you’re doing on-prem licensed GitHub enterprise.
Brian Seguin
Yeah. And I’d have to assume it’s less consumption-based model. It’s more of a licensing model.
James Hunt
Yeah. Because those are usually you pay us either a site license or you pay per number of instances, you’re going to run so if you’re clustering things, you’re paying more, but they’re not charging you per repo. They’re not charging you per user, because you’re running on your infrastructure. So that resource expense is already in the infrastructure,
Brian Seguin
And is that a large footprint? Or does that really depend on what you’re throwing in the repo? Or how does that work?
James Hunt
It really does depend more on how often you’re accessing, okay, and I’m going to tell a story about a CI/CD solution named Concourse that overpowers — I think I’ve now been on two customers that have moved, they’ve had to hack something in because Concourse, as a CI/CD solution needs to know when things happen on the SCM side. They need to know when a commit gets pushed, and the way Concourse, does it, it polls and every 30 seconds, it says, “hey, get Is there anything new? Hey, Git? Is there anything new?”
James Hunt
And it does that for every single input to every pipeline. So the first pipeline, you get, every 30 seconds, “Hey, is this thing committed? I need to do tests or something.” As you add more inputs and more pipelines that just gets worse. And I’ve had — we’ve had to implement things where Concourse talks to a webhook kind of buffer, so that we didn’t have to scale up GitHub Enterprise to not fall over because concourse was just pestering it too much.
James Hunt
But yeah, it really depends on your usage from a clone perspective, a push perspective. And, and, and I guess, just the PRs, like, how often does the web interface get used? And you are on the hook, by the way, for all the scaling and capacity planning there. It’s one of the chief disadvantages of — really anything on-prem, but specifically SCM, is you have to make sure it’s available, that it’s up and functional. And that it’s scaled, and you’re planning for your capacity, both in terms of disk storage, which is usually not a big deal, because Git is source code, and source code isn’t big. So you don’t generally have to like, “well, we blew out a terabyte of disk on a repo” — that I’ve never seen that I think even the Linux kernel source fits on a CD still.
Brian Seguin
Okay. So I guess kind of summarizing here, we’re either talking about a “rent” or a “buy” scenario we — “build” is not really a thing for for this right?
James Hunt
If someone asked me, should I build my own SCM? They don’t even get half of the question out; I say “No! Don’t do that”
Brian Seguin
I didn’t even I didn’t even realize that people were asking that question.
James Hunt
No one’s asked me that question. But if they did, Brian, I am ready. On the draw to cut them off–
Brian Seguin
Should I build my own–
James Hunt
NO! DON’T DO IT!
James Hunt
I think you should always rent your sem. I think for the most part, Git is never a core competency. And it’s usually just a thing, you have to have to make modern development practices work. You can’t do Git ops. Without Git. You can’t really do CI/D without source code management or source control. And I think the best value proposition for anybody out there, short of security requirements or physical air gapping, or other things that that always intrude on a public cloud SaaS strategy, you should always push those out so that somebody else is managing all of the things that make the thing work.
James Hunt
I think you should buy when it makes financial sense, if you’ve got so much spend on your SCM, that it’s detracting from its, you know, taking away from budget for other things like proper usability testing, QA departments, marketing, any of the things that make the product or the service actually work, then you should pull it on-prem. But you really need to make sure you have done the due diligence and the research to verify that the additional work of making sure that it’s available, scaled that you’ve got backups in place, right?
James Hunt
I have never once lost a GitHub repo because of something that happened on GitHub’s side. I’ve deleted repos on accident even though they have a two-phase commit thing and everything but I’ve never lost one because “oh yeah, that servers hard drive gave out.” If you if you know for a fact that the additional effort involved in maintaining and managing the all those points will outweigh the spend in the cloud, then I’d say go for a “buy”, but I think that’s a very minority case.
Brian Seguin
I think some of the advantages of the buy in the scenario is even if you do buy, you can still rent at the same time fairly easily and translate pretty seamlessly between the two, right?
James Hunt
Especially if you’re in that SaaS-on-premise model for buy.
Brian Seguin
So, in summary, I always rent, but buy if you have to?
James Hunt
Pretty much.
Brian Seguin
Okay, that sounds like a wrap.