Open to new opportunities

Sarath Kumar K.

Glasgow, UK

Senior
DevOps
Engineer

9 years designing, automating, and operating production-grade cloud platforms on AWS and Azure. I build the infrastructure that makes delivery fast, reliable, and observable — and then automate everything so it stays that way.

At a glance

Experience9 years

Current roleScottish Govt · SPM

Cloud platformsAWS · Azure

Infra cost saved~20%

IaC coverage

95%

CI/CD automation

90%

Observability

88%

What I bring

Core capabilities

☁️

Cloud Infrastructure

Production-grade AWS & Azure environments built with Terraform, CloudFormation, and Bicep. Multi-AZ, auto-scaling, and ~20% cost optimised.

🔄

CI/CD Engineering

End-to-end pipelines with Jenkins, GitHub Actions, GitLab CI. Build → test → containerise → deploy with zero manual steps in production.

🐳

Containers & Orchestration

Docker and Kubernetes at scale — Helm charts, Ingress, HPA, RBAC across dev, staging, and prod. Formerly kubeadm clusters, now EKS.

📊

Observability

Prometheus + Grafana + CloudWatch stacks. SLO tracking, alerting runbooks, and incident dashboards that actually surface the right signal.

🔐

Security & Compliance

GDPR-aligned cloud security: IAM least-privilege, KMS encryption, Secrets Manager, CloudTrail audit trails, and zero-trust network design.

🤖

Automation & SRE

Python & Ansible automation frameworks. Idempotent playbooks, webhook pipelines, and Slack self-service workflows to kill toil at source.

Certifications

🏅

AWS Certified Cloud Practitioner

Amazon Web Services · February 2025

🏅

Terraform Associate

Mphasis · December 2024

Career history

9 years of impact

From project engineer to senior DevOps lead — building platforms that ship faster, fail less, and cost less to run.

Jul 2022 – Present · 3+ years

Senior DevOps Engineer

Mphasis · Scottish Government (SPM)

Architected production-grade AWS infrastructure with Terraform — modular IaC for VPCs, EC2, ALB/NLB, ASGs, RDS, S3, IAM, KMS, and Secrets Manager across multiple environments
Led a DevOps team of 3, mentoring engineers, assigning work, and driving delivery cadence to reduce incident time
Engineered end-to-end CI/CD pipelines with Jenkins, Docker, and Kubernetes across dev, non-prod, and production environments
Achieved approximately 20% infrastructure cost reduction through resource right-sizing and Auto Scaling optimisation
Implemented Prometheus/Grafana monitoring across all platforms; introduced Slack-based self-service workflows for common support requests
Established GDPR-aligned cloud security: role-based access, KMS encryption, secrets handling, and full audit trails via CloudTrail
Created and maintained comprehensive runbooks, IaC standards, and troubleshooting procedures on Confluence

Jan 2021 – Jul 2022 · 1.5 years

Senior DevOps Engineer

Mphasis HP · Sirius Dev Environment

Owned build, release, and monitoring processes for Sirius firmware in Agile environments
Designed and maintained multi-branch CI/CD pipelines using Jenkins, Docker, and Kubernetes for repeatable, safer releases
Implemented IaC with Terraform across AWS and Azure, improving scalability and disaster recovery capabilities
Led migration of legacy applications to Kubernetes, improving observability and rerun orchestration
Developed automation frameworks using Shell, Python, Ruby, and Ansible for multi-cloud environment provisioning
Managed artifact versioning and release with Jenkins and Nexus; applied Power Platform ALM practices
Mentored junior engineers on DevOps best practices and incident runbook adoption

Aug 2018 – Jan 2021 · 2.5 years

DevOps Engineer

ThinkPalm Technologies · RADview

Migrated RADview from legacy architecture to CI/CD using Docker, Ansible, Kubernetes, and Helm
Configured and maintained Kubernetes clusters (COE) with Helm charts for production environments
Implemented Prometheus/Grafana monitoring and automated alerting integration
Acted as release engineer for RADview migrations and production rollouts, handling environment consistency and rollback planning

Jul 2016 – Aug 2018 · 2 years

Project Engineer

Wipro Technologies · Microfocus Cloudassessment

Established CI/CD using Jenkins and Docker for Microfocus Cloudassessment and Enterprise Maps
Built test automation from scratch with Jenkins and Selenium to improve release quality
Owned release responsibilities: packaging, environment coordination, and delivery management

Technical competencies

The full stack

Deep expertise across cloud, infrastructure, containers, automation, security, and observability. Expert = daily-use, production-proven.

☁️ Cloud Platforms

AWSAzure EC2EKSS3 RDSVPCALB / NLB Route 53CloudFront Azure AKSAzure AD

🏗️ Infrastructure as Code

TerraformAnsible CloudFormationBicep HelmKustomize

🔄 CI/CD & DevOps

JenkinsGitHub Actions GitLab CIAzure DevOps AWS CodePipelinePower Platform ALM

🐳 Containers & Orchestration

DockerKubernetes EKSHelm NGINX IngressHPARBAC

📊 Monitoring & Observability

PrometheusGrafana CloudWatchAzure Monitor KibanaELK Stack

🔐 Security & Governance

IAMKMS Secrets ManagerSSM CloudTrailKey Vault GDPRRBAC

💻 Scripting & Languages

PythonBash / Shell GroovyRuby YAMLHCL

🗄️ Data, Messaging & More

AthenaGlue RedshiftRDS SNSSQSNexus GitJiraConfluence

This very website

How it's built & hosted

This portfolio runs on a real, self-managed Kubernetes platform — not a static site host. Source lives on GitLab, Jenkins builds and ships every change, and the app serves over HTTPS from a Kubernetes cluster on AWS, inside a locked-down VPC.

Deployment architecture — VPC, security groups, Kubernetes & observability

🟦 Network & compute · 🟨 Security boundary (SG / IAM) · 🟩 Registry & secrets · 🟪 Observability stack — shown here as the target monitoring design for this platform.

Security design

🔐

VPC isolation

App workloads sit in a private subnet inside a dedicated VPC. Only the ingress layer is reachable from the public subnet — no direct exposure of pods or the Kubernetes API.

🧱

Security groups

Separate SGs for the public-facing ingress (80/443 open, 22 restricted to a trusted IP) and the Kubernetes node (only 3000/5000 reachable from the ingress SG, nothing public).

🛡️

Network ACLs

A stateless subnet-level ACL sits behind the security groups as a second layer of defence, following AWS's defence-in-depth recommendation.

🔑

IAM least privilege

The EC2 instance profile grants only the permissions needed to pull from ECR and write CloudWatch metrics — no broad admin access.

🗝️

KMS & Secrets Manager

Sensitive values are encrypted at rest with KMS and never committed to source; Kubernetes Secrets and the ECR pull token are generated at deploy time, not stored in Git.

📜

CloudTrail audit log

All API-level account activity is logged through CloudTrail, giving a traceable audit history of changes to the infrastructure.

Request & deployment flow

# deployment flow GitLab push → Jenkins (EC2) → docker build → push to Amazon ECR → kubectl apply on Kubernetes EC2 → rolling update, zero downtime # live request flow Browser → https://sarathportfolio.cloud → Traefik Ingress (TLS via cert-manager) → / → portfolio-frontend service → Nginx pods (React build) → /api → portfolio-backend service → Node.js Express pods

k8s/deployment.yaml — rolling update, zero downtime

apiVersion: apps/v1 kind: Deployment metadata: name: portfolio-frontend namespace: portfolio spec: replicas: 2 strategy: type: RollingUpdate rollingUpdate: { maxSurge: 1, maxUnavailable: 0 } # always 2 pods up template: spec: imagePullSecrets: [{ name: ecr-registry-secret }] containers: - name: frontend image: <account>.dkr.ecr.eu-north-1.amazonaws.com/portfolio-frontend:<build> ports: [{ containerPort: 3000 }] livenessProbe: { httpGet: { path: /health, port: 3000 } } readinessProbe: { httpGet: { path: /health, port: 3000 } }

k8s/ingress.yaml — Traefik routing + Let's Encrypt TLS

apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: portfolio-ingress namespace: portfolio annotations: cert-manager.io/cluster-issuer: letsencrypt-prod spec: ingressClassName: traefik tls: - hosts: [sarathportfolio.cloud, www.sarathportfolio.cloud] secretName: sarathportfolio-tls rules: - host: sarathportfolio.cloud http: paths: - path: / backend: { service: { name: portfolio-frontend, port: 80 } }

k8s/hpa.yaml — Horizontal Pod Autoscaler

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: { name: portfolio-frontend-hpa, namespace: portfolio } spec: scaleTargetRef: { kind: Deployment, name: portfolio-frontend } minReplicas: 2 maxReplicas: 5 metrics: - type: Resource resource: { name: cpu, target: { averageUtilization: 70 } } - type: Resource resource: { name: memory, target: { averageUtilization: 80 } } # scale-down waits 5 min to avoid flapping; scale-up reacts within 60s

k8s/ecr-secret-cronjob.yaml — keeping the registry secret alive

Amazon ECR auth tokens expire every 12 hours. A Kubernetes CronJob runs every 6 hours, re-authenticates with AWS, and rotates the ecr-registry-secret so pods can always pull the latest images without manual intervention.

schedule: "0 */6 * * *" # every 6 hours command: - TOKEN=$(aws ecr get-login-password --region $AWS_REGION) - kubectl delete secret ecr-registry-secret --ignore-not-found -n portfolio - kubectl create secret docker-registry ecr-registry-secret \ --docker-server=$ECR_REGISTRY --docker-username=AWS --docker-password=$TOKEN -n portfolio

Automation pipeline

How code ships

Every push triggers a Jenkins pipeline running on its own EC2 instance: build Docker images, push to Amazon ECR, then deploy straight to the Kubernetes cluster with kubectl — no manual steps after a push.

Jenkins pipeline — push to live

📦

Checkout

GitLab · webhook

→

🐳

Docker Build

frontend + backend

→

📤

Push to ECR

build-number + latest tag

→

🔑

Refresh Secret

ecr-registry-secret

→

☸️

kubectl apply

deploy + ingress

→

🚀

Rollout

rolling update · live

Jenkinsfile — declarative pipeline (actual)

pipeline { agent any environment { AWS_REGION = 'eu-north-1' ECR_REGISTRY = '<account-id>.dkr.ecr.eu-north-1.amazonaws.com' IMAGE_TAG = "${env.BUILD_NUMBER}" FRONTEND_IMG = "${ECR_REGISTRY}/portfolio-frontend:${IMAGE_TAG}" BACKEND_IMG = "${ECR_REGISTRY}/portfolio-backend:${IMAGE_TAG}" } stages { stage('Checkout') { steps { echo 'Checkout successful' } } stage('Docker Build') { steps { sh 'docker build -t $FRONTEND_IMG -t $ECR_REGISTRY/portfolio-frontend:latest ./frontend' sh 'docker build -t $BACKEND_IMG -t $ECR_REGISTRY/portfolio-backend:latest ./backend' } } stage('Push to ECR') { steps { sh ''' aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_REGISTRY docker push $FRONTEND_IMG && docker push $ECR_REGISTRY/portfolio-frontend:latest docker push $BACKEND_IMG && docker push $ECR_REGISTRY/portfolio-backend:latest ''' } } stage('Deploy to Kubernetes') { environment { KUBECONFIG = credentials('kubeconfig credential') } steps { sh ''' kubectl create namespace portfolio --dry-run=client -o yaml | kubectl apply -f - kubectl create secret docker-registry ecr-registry-secret -n portfolio \ --docker-server=$ECR_REGISTRY --docker-username=AWS \ --docker-password="$(aws ecr get-login-password --region $AWS_REGION)" \ --dry-run=client -o yaml | kubectl apply -f - kubectl apply -f k8s/deployment.yaml -n portfolio kubectl apply -f k8s/service.yaml -n portfolio kubectl apply -f k8s/ingress.yaml -n portfolio kubectl rollout restart deployment/portfolio-frontend deployment/portfolio-backend -n portfolio kubectl rollout status deployment/portfolio-frontend -n portfolio --timeout=180s kubectl rollout status deployment/portfolio-backend -n portfolio --timeout=180s ''' } } } post { success { echo '✅ GitLab → Jenkins → ECR → Kubernetes → HTTPS Ingress' } failure { echo '❌ Pipeline failed' } } }

Why the pipeline refreshes the ECR secret on every run

Amazon ECR login tokens expire after 12 hours, so a long-lived Kubernetes cluster can silently lose the ability to pull new images. Every deploy regenerates ecr-registry-secret before applying manifests, and a CronJob refreshes it again every 6 hours independently — so deployments stay reliable even between pipeline runs.

frontend/Dockerfile — multi-stage build, non-root user

# Stage 1: build FROM node:20-alpine AS builder WORKDIR /app COPY package*.json ./ RUN npm install COPY . . RUN npm run build # Stage 2: serve with nginx (non-root) FROM nginx:1.25-alpine COPY --from=builder /app/build /usr/share/nginx/html COPY nginx.conf /etc/nginx/conf.d/default.conf RUN addgroup -S appgroup && adduser -S appuser -G appgroup \ && chown -R appuser:appgroup /usr/share/nginx/html USER appuser EXPOSE 3000 HEALTHCHECK --interval=30s CMD wget -qO- http://localhost:3000/health || exit 1 CMD ["nginx", "-g", "daemon off;"]

docker-compose.yml — single-host fallback deployment

Alongside the Kubernetes path, the repo also ships a Docker Compose configuration so the same images can run directly on a single EC2 host behind Nginx — useful for quick recovery or a lower-cost fallback environment.

services: nginx: image: nginx:1.25-alpine ports: ["80:80", "443:443"] depends_on: [frontend, backend] frontend: image: ${ECR_REGISTRY}/portfolio-frontend:${IMAGE_TAG:-latest} expose: ["3000"] healthcheck: { test: ["CMD", "wget", "-qO-", "http://localhost:3000/health"] } backend: image: ${ECR_REGISTRY}/portfolio-backend:${IMAGE_TAG:-latest} expose: ["5000"] environment: [AWS_REGION=${AWS_REGION:-eu-west-2}]

SeniorDevOpsEngineer

This portfolio is the proof.

Senior
DevOps
Engineer