Kushagra Saxena

DevOps / Site Reliability Engineer

I love building fast, scalable, and automated cloud systems that just work. As a Site Reliability Engineer, I focus on designing infrastructures that are resilient, efficient, and hands-off—automated wherever possible. I believe in speed, ownership, and calculated risks—many decisions are reversible, so I don't get stuck overanalyzing. Instead, I take action, solve problems, and never say, "That's not my job."

With expertise in GCP, Kubernetes, and Distributed Systems, I specialize in cloud automation, Infrastructure as Code (IaC), and CI/CD pipelines. I enjoy writing Go, creating self-healing infrastructure, and building monitoring solutions from scratch to improve system reliability. I'm always learning, experimenting, and finding ways to make cloud systems simpler, faster, and more reliable.

KS

What I'm Doing

DevOps & Automation

Develop, Build, Deploy and Monitor. Workflow Automation, Scripting, Configuration Management.

Security & Monitoring

Access Control, Encryption, Vulnerability Scanning, Compliance, Threat Detection, Incident Response.

Site Reliability

Observability, Load Balancing, Scalability, High Availability and Incident Management.

Training & Mentoring

Training, Resources, Individual Roadmaps, Mentoring.

Professional Experience

5+ years of experience in Site Reliability Engineering and DevOps

Senior Site Reliability Engineer

Exabeam • 03/2024 - Present

  • • Led migration from Looker Classic to Looker Core across Dev/Staging/Prod using Terraform; authored complete provisioning and deployment documentation.
  • • Developed custom Kubernetes operators to automate Spanner autoscaling and per-customer Logstash deployments using Helm and ArgoCD.
  • • Implemented Falco for runtime security with custom rules and GitOps-based rollout.
  • • Enforced secure networking via Istio (namespace/FQDN-based policies) and iptables for restricting internet egress to VPC-only traffic.
  • • Integrated Sysdig image scanning into GitHub Actions pipelines for vulnerability checks before image pushes to Google Artifact Registry.
  • • Deployed and managed 10+ ClickHouse clusters with GCP Private Connect, IP whitelisting, Prometheus monitoring, and step-based autoscaling strategies.
  • • Investigated and resolved incidents including CI/CD failures, DNS issues, and Kubernetes access problems.
  • • Configured Grafana alerts routed to Opsgenie to enhance incident response workflows.
  • • Reduced false positives and resource sprawl by identifying and cleaning up orphaned MongoDB instances and stale resources.
  • • Owned end-to-end deployment of Windmill app using Kustomize and GitHub Actions with staged rollout across environments.

Member of Technical Staff - 2

VMWare • 04/2022 - 12/2023

  • • Automated Tanzu Kubernetes Grid (TKG) integration into Tanzu Mission Control (TMC); resolved issues across multi-cloud platforms (AKS, EKS, GKE).
  • • Developed CI/CD pipelines using Jenkins and Concourse for end-to-end automation of code promotion and infrastructure provisioning.
  • • Created a custom Concourse resource to detect code changes, build containers, and push to image registries.
  • • Built and maintained a test automation framework (Ginkgo, Go) for validating upgrades and cross-interface compatibility in TMC-SM.
  • • Designed automation for syncing container images across registries (e.g., Harbor ↔ ECR), improving image availability.
  • • Extended the internal test platform to support public cloud clusters (EKS, AKS, GKE).
  • • Developed a test framework using Ginkgo and Golang which works on multiple product interfaces. It helped in multifold reduction in the testing time.
  • • Worked on enabling the self-developed testing framework for various public cloud Kubernetes platforms (EKS, AKS and GKE).
  • • Automated AWS cost optimizations using API-driven scheduling and tagging strategies.
  • • Implemented a source-control-based authorization provider for internal CI systems.

Senior Software Engineer

Capgemini • 09/2019 - 04/2022

  • • Led development of a scalable test automation framework for Guidewire Insurance Suite (State Farm Insurance), reducing testing time by 8x.
  • • Improved PolicyCenter message processing throughput by 11% through queue optimization.
  • • Optimized backend services by reducing unnecessary DB queries, achieving 35% faster response times.
  • • Deployed and managed enterprise apps on AWS Linux servers, including monitoring and performance tuning.
  • • Built an automated headless server management system that reduced downtime via proactive alerting.
  • • Designed tools (e.g., GAnalyzer) to validate Guidewire data models and reduce runtime schema issues.
  • • Led development of an employee management portal (Spring Boot + Angular) with JWT-based access control.

Education & Certifications

  • • B.Tech (Hons.) in Computer Science and Engineering - ABES Engineering College, Ghaziabad (07/2015 - 06/2019)
  • • Oracle Certified Associate – Java SE 8 Programmer I (1Z0-808) - Secured 92% score

My Skills

DevOps

GCP 90%
Kubernetes 95%
Terraform 90%
GitHub/GitLab 85%
CI/CD + Automation 95%

Programming

GoLang 90%
Python 85%
Shell Scripting 90%
Java 80%
JavaScript 75%

Portfolio

Business Ideas Platform: FullStack AI-Powered App

Featured

Built a full-stack web application using React (frontend), Golang (backend), and Neo4j (graph database) for managing business ideas with user collaboration and like/comments features. Integrated Auth0 for secure user authentication and role-based access control. Leveraged AI tools like Cursor, Claude.ai, and other agents to accelerate development, perform code reviews, and generate boilerplate efficiently. Demonstrated applied experience in AI-assisted development workflows, enhancing productivity and speeding up feature delivery.

React Golang Neo4j Auth0 AI Tools

RTCM: Remote Traffic Control & Monitoring System

Featured

Designed and developed an IoT-based intelligent traffic control system using Python, Raspberry Pi, and Google Maps Public API. System dynamically adjusted traffic light timing based on real-time traffic data and congestion patterns. Built with a vision to enable choke-point analysis, live visualization, and improved city traffic planning. Demonstrated capabilities in hardware-software integration, real-time data streaming, and crowd-sourced analytics.

Python Raspberry Pi Google Maps API IoT Real-time Data

ExcelSheetMapper

Developed a tool using Apache POI (Java) to automate the extraction and consolidation of data from multiple Excel sheets. Designed logic to intelligently map and merge inputs into a single structured output file. Achieved a 93% reduction in manual copying errors, significantly increasing accuracy and operational throughput. Helped eliminate repetitive work in internal workflows, showcasing potential for enterprise data processing automation.

Java Apache POI Excel Automation Data Processing

Achievements & Recognition

Awards and recognition for exceptional contributions

Award 2024

Exabeam Recognition

Received multiple "Pat on the Back" awards for consistent high-impact contributions and ownership of critical SRE initiatives.

SRE Leadership Exabeam
Award 2022-2023

VMWare Excellence

Received 5+ spot awards for exceptional performance on different products.

VMWare Performance Multi-cloud
Innovation 2020

Capgemini Innovation

Awarded as winners in solution development under WeSynergize initiative 2020 — Developed a Spring/Angular based web app.

Spring Boot Angular Web Development
Leadership 2017-2018

GDSC College Lead

Google Developer Student Clubs (07/2017 - 06/2018) — Worked in developing various app solutions for local businesses to connect them to internet and enhance their business.

Google Student Leadership Community

Get In Touch

Interested in working together? Let's discuss your infrastructure needs.

Email

Kushagra.saxena.3@gmail.com

Location

Delhi, India

Contact Form