Jobs-as-Code: The Business Processing DevOps Forgot

Developing web and mobile apps is cool. But in the words of a longtime ballad: “Don’t forget who’s taking you home and in whose arms you’re gonna be …” It’s the back-end services that underpin a lot of those cool apps and make them possible. So yes, servers and relational databases and enterprise applications and sometimes even mainframes are definitely a thing. And in those environments, scripting, batch processing, application integration and operational visibility are a very big thing that, more often than not, is managed by workflow and job scheduling tools.

I argue that the “something—application, code, other” elements used to deliver all those capabilities are logically part of the application and should be treated as such. The fact that “process X” needs to start only after certain data has arrived, been cleansed and inserted into some database is just as much a part of the application as calculating an invoice amount or determining that inventory needs to be replenished.

And if you are still reading, then I submit that you need jobs-as-code.

Simply stated, jobs-as-code means that the jobs—automation rules for business applications—you need to manage and execute systems of record, back-end application components and data pipelines in production are built and managed in the same way as Java, Python, C++ or any other business logic code. In other words, everything that is part of the application should be included in the software development life cycle (SDLC).

Of course, that statement is pretty simple to say but not that easy to accomplish, because traditional workflow, job and batch processing solutions have lived (and most continue to) in the operations world. The practitioners of enterprise scheduling tend to prefer graphical interfaces with a strong emphasis on ease of use and rigid control over access. Traditionally, capabilities such as developer self-service, automated build and test and deploy have not been high on most enterprise job scheduling new feature requirements wish lists. Most commercial solutions, if they do anything, add an API or two and consider the job done. Open-source authors build tools that cater to developers with little to no regard for the multitude of other user types that rely on batch workload.

The resulting state in the industry is that developers spend lots of time building “snowflake” operational plumbing which may be difficult to test, difficult to embed in an automated delivery pipeline and either has to be reworked to make it ready for production or becomes an albatross around the necks of operations teams trying to keep it running. It’s time for the DevOps community and the industry to recognize there is a better way and we all instinctively know what it is.

Jobs-as-Code

To understand jobs-as-code, let’s examine the traditional dev-to-ops handoff from a developer and DevOps perspective.

I write some code and now I want to test. The execution is simple; I want to pull some data from a relational back end, run my code to manipulate the data and push the result set to some other location to serve as a data source for a different application. I need a SQL query, run my Java code and ftp or sftp the result.

I’ve used some interactive tool to extract my test data, debugged my code in Eclipse and just browsed the output with Notepad++. Now I want to construct a flow that either I or any of my teammates can run whenever they want.

Today, I probably brush off my scripting skills, bash or perl or whatever, and write a quick script to run a SQL query, run my java code and then run the file transfer. Easy, right?

Maybe not.

Connecting to the database interactively was so trivial I didn’t even think about it, but doing it in the script is a whole different exercise. I’ve got to get my quotes and escape characters right; I have to make sure I connect without compromising the credentials needed to log in to the database; I even have to figure out what return codes I’m getting and what to do with them. Of course, I need to add some notification so that I’m informed if there was a problem.

OK, got all that done. Now running my Java code is a piece of cake right? Well, it is when running on my machine with my credentials but maybe my teammate has a different environment. Or he’s a .net guy and doesn’t have Java configured the same way as me. Or worse, maybe doesn’t even have it installed. Better add the code to check for all these problems and notification for that, too. And when I get that failure notification, what do I look at to figure out what went wrong? I’d better pipe everything to a log and make sure others can view the log, too, so I make sure I put it on a shared filesystem that others can access.

Let’s not event start on the file transfer! Or, when you have to add additional steps or when you have to run this a 3 a.m. as part of some testing that another group is doing or making sure that database you rely on is up and running, etc., etc., etc.

Eventually, after way more time than I ever thought it would take, that massive script is done. I run it through some test cases and I’m done.

On to the rest of the delivery phases with more testing and probably some more extensions that make the script truly epic.

In those last few stages of getting to production, someone may raise the issue that this new application interacts with some other application that’s managed by some production automation tool. OK, let the rework—or, really, brand new unplanned work—begin. The developers aren’t familiar with the tool and the tool administrators aren’t familiar with the application so you can imagine how well those conversations go. However, they are all professionals and eventually manage to get the job done and the application proceeds to production.

If this long, drawn-out story didn’t already make you uneasy, you may have caught the huge fail that is now just waiting to blow up in our proverbial faces. Most, if not all, the testing was already completed. The new work done at the eleventh hour gets minimal, if any testing, is deployed and—surprise!—has a high failure rate in production.

To add insult to injury, somewhere down the road after all the heroics of getting the application deployed are long forgotten, a failure occurs, as is only natural and to be expected. The poor on-call folks must negotiate through that production tool they aren’t familiar with and isn’t part of their world. And then they encounter that massive, epic script. What’s going on inside of that thing and where did the failure occur? Where are the logs and do we even have all the ones we need? And saving the best for last, after the problem has been identified, how do we restart this process in the middle when half of the processing was already completed?

So What Would Nirvana Look Like?

What if you could create a database query by just coding the SQL statement(s), picking the database you wanted from a list of logical names and requesting notification in case of error and you could express those requirements in a few simple statements? If you want to run it at 3 a.m., you just add that requirement too and then say “run.” The availability of the database is checked, the credentials are secure, there is no scripting, log and output are captured automatically (not just from this run but as many runs as you want, and you can compare one run with another, etc.) and you get whatever notifications you requested. The logic would be solid because it’s been in use for years by organizations like yours, rather than being written under pressure while trying to meet a deadline.

You could write your automation rules (jobs) in some simple notation you were already familiar with such as JSON, store it together with your Java code in git or whatever SCM you use. The jobs could be packaged with the rest of your application and your build, config management and testing tools and could push your entire application including your jobs through the entire delivery pipeline so there are no surprises or additional work required before deploying to production. The facilities to deliver operational visibility, support problem analysis and log and output management would be second to none because they have been evolving with input from community and IT professionals.

Imagine if this magical-sounding solution that implemented Jobs-as-Code was available from github and other public sources? You could download, install and have access to Jobs-as-Code in minutes. You wouldn’t have to submit access requests to IT folks, no purchase requisitions, no messy and complex installations.

You don’t have to imagine. Jobs-as-code solutions already exist. If the tool(s) you’re using doesn’t give you these capabilities, switch! At least start talking to your peers and let them know the better way has arrived. With sufficient demand from the community, the vendors and authors implementing the myriad job management tools may just accommodate you.

— Joe Goldberg