A fairly common task for operations folks is managing URLs. URL management either uses redirection or rewrites to manage things like vanity URLs or marketing campaign URLs or even just getting users to current URLs from old, deprecated ones. The business purpose behind the need to manage URLs aside, the process for doing so can become quite unwieldy and time consuming.
That’s because traditionally URL redirection and rewriting is a layer 7 proxy or web server function that requires modification of a configuration file. For example, here’s a simple URL rewriting configuration excerpt for Apache:
01.RewriteEngine On
02.RewriteBase /books
03.# If there's a cookie called thgtp-pre-version set,
04.# use its value and serve the page
05.RewriteCond %{HTTP_COOKIE} thgtp-pre-version=([^;])
06.RewriteRule ^guide-to-node$ %{REQUEST_FILENAME}-%1.html [L]
The key here is that it’s in the configuration file. This is just one rewrite rule. Imagine needing hundreds of these rules and then imagine them being changed on a weekly or even daily basis. Not only is this likely not the job you signed up for, it’s inevitable that at some point you’re going to fat finger a rule change and you know what that means. Yup. Apache no start. And you get to figure out where it is.
In some architectures, rewrite/redirect rules are simply made to be part of the web server configuration. Each and every web server has to have its configuration file updated when a single rule changes. More modern approaches take advantage of proxies, which greatly minimizes the number of instances of Apache (or whatever proxy you’re using) that need to be updated.
Regardless, it’s a pretty mundane and boring thing to do, and honestly, the whole process of having marketing (or whoever) kick off a request for a new rewrite/redirect rule and having the ticket make its way to your desk probably takes more time than it does to enter the rule and type restart httpd.
So how about we leverage some programmability in the network to automate this bad boy and, if you want, hand off the tedious management of the URLs to the people who have ownership of them (business and marketing folks)?
Programmable Proxies
I’ve explained before about programmable components, particularly those in the data path, and one of those components can be described as simply a “programmable proxy.” Proxies are not unfamiliar to dev or ops, so I’m going to assume we’re good on that front. The programmable piece comes in when you basically move all the logic that’s encoded in the configuration file (like rewrites and redirects) into a programmatic environment instead. So rather than configure the behavior of the proxy, you’re going to code the behavior of the proxy instead.
So how does that help, you might ask? Well, for starters it makes the phrase “infrastructure as code” ring a heck of a lot more true because, well, it really is code. That means it becomes an artifact like any other piece of code and gets managed like code. That means simpler rollbacks and version control.
Second, it means if you’ve got a robust enough environment you can extend the logic to include data sources like a database (SQL or NoSQL, your choice) that manage a mapping of old URL to new URL. And if it’s in a database, it’s a pretty simple thing to have someone put a pretty HTML form in front of it that enables business folks to manage that list.
The actual execution is straightforward: the proxy receives the request for one URL, say www.example.com/TVad and queries the database, gets back the real URL, say www.example.com/marketing/tv/2014 . It’s not that much different than a configuration-based approach, with the exception that you’re able to dynamically determine the mapping rather than hard-code it in a file.
Now, what the proxy does with it is up to you. You can send a redirect to the client with the “real” URL, or (a better choice if you care at all about performance) rewrite it and send it on to the right server.
A programmable proxy should be a key tool in every devops practitioner’s back pocket. With both network and application fluency, a programmable proxy provides the foundation for a variety of architectural patterns that rely on dynamic evaluation of application-layer requests in real time. The programmable proxy is well-suited for these tasks because it can be inserted into the data path nearly transparently, and doesn’t require special networking (or perhaps more importantly, the cooperation of the networking teams).