I’m trying to move my org into a more gitops workflow. I was thinking a good way to do promotions between environments would be to auto sync based on PR label.
Thinking about it though, because you can apply the same label multiple times to different PRs, I can see situations where there would be conflicts. Like a PR is labeled “qa” so that its promoted to the qa env, automated testing is started, a different change is ready, the PR is labeled “qa”, and it would sync overwriting the currently deployed version in qa. I obviously don’t want this.
Is there a way to enforce only single instances of a label on a PR across a repository? Or maybe there is some kind a queue system out there that I’m not aware of?
I’m using github, argocd, and circleci.
I would recommend you avoid relying on features of GitHub, and only use features of git. You never know when you might decide to switch repo hosting providers!
With that said, you’ve got a number of options: you can use tags or branches as “labels” to choose what’s applied to what environment, or depending on the flavor of IaC you’re using, have an entry point for each environment in your code which includes and parameterizes a common “environment” module.
Branching per environment gets to be a nightmare really quickly. I am trying to avoid that.
I am not worried about vendor lock in.
I agree. What I’m proposing is, if you go with that option, that you use a branch as a “single instance label”, pointing at commits within your main branch. Don’t use them as actual branches for additional environment-specific commits.
I’ve done something similar. In my case it was a startup script that did something like the following:
- poll github using the search API for PR labels (note that this has sometimes stopped returning correct results, but …).
- always do this once at startup
- you might do this based on notifications; I didn’t bother since I didn’t need rapid responsiveness. Note that you should not do this for the specific data from a notification though; it’s only a way to wake up the script.
- but no matter what, you should do this after N minutes, since notifications can be lost.
- perform a
git fetch
for your main development branch (the one you perform the real merges to) and allpull/
refs (git does not do this by default; you’ll have to set them up for your local test repo. Note that you want to refer to the unmerged commits for these) - if the set of commits for all tagged PRs has not changed, wait and poll again
- reset the test repo to the most recent commit from your main development branch
- iterate over all PRs with the appropriate label:
- ordering notes:
- if there are commits that have previously tested successfully, you might do them first. But still test again since the merge order could be different. This of course depends on the level of tests you’re doing.
- if you have PRs that depend on other PRs, do them in an appropriate order (perhaps the following will suffice, or maybe you’ll have some way of detecting this). As a rule we soft-forbid this though; such PRs should have been merged early.
- finally, ordering by PR number is probably better than ordering by last commit date
- attempt the merge (or rebase). If a nop, log that somewhere. If not clean, skip the PR for now (and log that), but only mark this as an error if it was the first PR you’ve merged (since if there’s a conflict it could be a prior PR’s fault).
- Run pre-build stuff that might need to create further commits, build the product, and run some quick tests. If they fail, rollback the repo to the previous merge and complain.
- Mark the commit as apparently good. Note that this is specifically applying to commits not PRs or branch names; I admit I’ve been sloppy above.
- ordering notes:
- perform a pre-build, build and quick test again (since we may have rolled back and have a dirty build - in fact, we might not have ended up merging anything!)
- if you have expensive tests, run them only here (and treat this as “unexpected early exit” below). It’s presumed that separate parts of your codebase aren’t too crazily entangled, so if a particular test fails it should be “obvious” which PR is relevant. Keep in mind that I used this system for assumed viable-work-in-progress PRs.
- kill any existing instance and launch a new instance of the product using the build from the final merged commit and begin accepting real traffic from devs and beta users.
- users connecting to the instance should see the log
- if the launched instance exits unexpectedly within M minutes AND we actually ended up merging anything into the known-good branch, then reset to the main development branch (and build etc.) so that people at least have a functioning test server, but complain loudly in the MOTD when they connect to it. The condition here means that if it exits suddenly again the whole script goes up and starts again, which may be necessary if someone intentionally tried to kill the server to force a new merge sequence but it was too soon.
- alternatively you could try bisecting the set of PR commits or something, but I never bothered. Note that you probably can’t use
git bisect
for this since you explicitly do not want to try commit from the middle of a PR. It might be simpler to whitelist or blacklist one commit at a time, but if you’re failing here remember that all tests are unreliable.
- alternatively you could try bisecting the set of PR commits or something, but I never bothered. Note that you probably can’t use
- poll github using the search API for PR labels (note that this has sometimes stopped returning correct results, but …).
GitHub has an Environments feature that is probably what you’re looking for. It will tell you exactly what pr is deployed to an environment at any point in time. I would recommend automatically removing labels after the deploy is done, so that you’re not depending on reading which labels are active, instead just use the Environments.
… it would sync overwriting the currently deployed version in qa. I obviously don’t want this.
Then don’t allow the deployed software to be overwritten while tests are running? Your test environment should be treated as a singleton. Your CI system shouldn’t be able to affect the deployment while it’s being used.