Site Reliability Engineer (SRE) with over 4 years of experience in managing critical infrastructure, optimizing system, performance, and ensuring high availability across complex, global environments.
Recognized for designing and implementing robust, scalable, and secure cloud solutions that boost uptime and resilience. Demonstrated expertise in incident response, technical troubleshooting, and root cause analysis, minimizing downtime through proactive monitoring and automation.
There is a particular flavour of engineering dysfunction that looks, from the outside, like peak performance. Deployments are frequent. Sprint velocity is high. The feature backlog is shrinking. Leadership is pleased. And ...
There is a dangerous conflation happening across our industry right now. Teams are plugging LLM-powered agents into their deployment pipelines, calling it "agentic CI/CD," and treating it as the next logical step ...
There was a time when compliance meant a quarterly ritual. Someone from security would walk over with a spreadsheet, ask a few questions, tick a few boxes and disappear until the next audit cycle ...
In 2026, cloud cost overruns stop being finance’s problem and become an engineering responsibility. Here’s how treating cost as code finally makes FinOps work ...
Part 3: Discover how autonomous SRE transforms incident management and system reliability, enabling self-healing systems that reduce reliance on human intervention ...
Part 2: Discover how to harness incident history and AI to predict and prevent operational issues before they escalate, improving efficiency in Site Reliability Engineering ...
Part one of a three-part series: Discover how AI-driven reasoning agents are revolutionizing SRE practices by eliminating traditional toil and enhancing incident management ...
I used to think capacity planning was about setting up CloudWatch alarms and hoping they'd fire before things broke. Spoiler: that's not capacity planning—that's just reactive firefighting with extra steps. Real capacity ...