It is interesting to watch the growth of an idea over time. The idea of using flags in code to do different things is as old as programming itself. Old-school configuration files used them, and they worked, so they were adapted.
Thing is, given an idea that works, humanity inevitably takes it too far, and flags to control software behavior seem to have reached that point. If you are using one set of flags for A/B testing, another to segregate code in development at runtime, another to turn functionality on for some users but not others and yet another set of flags to hide data based upon user…You’re buried in unnecessary code branches. If your organization isn’t great about cleaning up after flags are no longer needed, you are building spaghetti code that future developers will not thank you for.
The best answer to this growing burden is focus. What is the best use of feature flags/toggles in your environment? They certainly speed the delivery process by having the code there, in nightly builds, just not executed until the flag is flipped. This helps in rollout also by allowing operators to turn a feature on when it is ready by changing config rather than redeploying the application.
For A/B testing, in many cases it makes more sense to handle switching at a load balancer, where again it is a config, but separate instances serve the UI in question. This, like all decisions regarding flags, should be based upon the volume of change. Changing the contents of a list is likely easier to test with flags, but large volume of change is probably better run separately and switched off of a load balancer.
The same is true with the volume of change in source. A rollout might be the best option for handling large volumes of change, and feature flags may simply create more complexity.
As with everything, approach the problem with a plan that works best, in this instance, in your environment. There is no denying that the ability to turn on a new feature and then turn it off if something goes wrong is appealing, just use the idea judiciously. And clean up after yourself. No shop that uses flagging extensively will say “yeah, we don’t clean it up,” because the mess of nested checks for flags would slow the code and make it passingly difficult to even read, let alone maintain, over time. So approach it from the beginning with “It is a tool, and when it no longer serves its intended purpose (the feature is fully live or abandoned), we remove it.”
And keep rocking it. You are the engine of business, and this is what you do. Make it great and keep it running.