Aron Day | Site Reliability Engineer

terminal — zsh

Scroll to explore

# about

whoami

Site Reliability Engineer making self-hosted deployments bulletproof at Tines. A decade of SRE battle scars, a detour through Customer Success and Sales Engineering, and a habit of turning operational pain into automated solutions. I build the kind of infrastructure that lets customers sleep through the night and ship with confidence.

Deep SRE and DevOps expertise: I live for defining SLIs/SLOs, automating infrastructure with Terraform, building observability stacks that cut MTTD/MTTR, and designing Kubernetes architectures that don't page you at 3am. GitOps workflows, progressive delivery (canary/blue-green), and security guardrails baked in from day one. Automation geek who believes if you do it twice, you automate it.

Away from the keyboard, I race a #95 Global Light on circuits across Ireland and the UK.

# experience

2026

Site Reliability Engineer

March 2026 - Current

▹Owning the self-hosted product experience end-to-end: making installs, upgrades, and day-two operations so smooth that customers forget there's a Helm chart under the hood.
▹Optimizing Kubernetes deployments for reliability and resource efficiency. HPA tuning, affinity rules, resource quotas, and all the unglamorous YAML that keeps self-hosted customers happy and their clusters healthy.
▹Building observability that works everywhere, including air-gapped environments where "just use SaaS" isn't an option. OpenTelemetry for metrics and traces, FluentD for logs, and dashboards that actually answer questions.
▹Dogfooding Tines to automate our own infrastructure: deployment pipelines, environment management, and DevOps tooling that proves the product works because we run it ourselves.
▹Mentoring engineers across the org on Kubernetes and container orchestration best practices. Helping Sales talk confidently about self-hosted, Support debug tricky cluster issues, and Engineering build features that work seamlessly across cloud and on-prem.
▹Acting as the subject matter expert when critical self-hosted customer issues hit. The kind of deep-dive troubleshooting where you need someone who's lived through enough production fires to stay calm and methodical.

2024

Customer Success Engineer

June 2024 - March 2026

▹Building production-ready automation workflows that actually work in the real world. Connecting customer tools via APIs and turning "it would be cool if..." conversations into deployed solutions.
▹Living in customer environments: running working sessions, building success plans, and having the kinds of honest technical conversations that help teams ship faster and sleep better.
▹Doing the hands-on work that matters. Custom integrations, playbooks, and repeatable patterns that compress time-to-value from "eventually" to "next sprint."
▹Using a decade of SRE battle scars to help self-hosted customers design rock-solid deployments: HA architectures, resource tuning, upgrade automation, and all the operational goodness that prevents 2am pages.
▹Creating the docs and patterns that help teams help themselves. Because the best kind of support is the kind you don't need to ask for.
▹Being the bridge between customers and product. Turning field experience into features and one-off hacks into platform capabilities that everyone benefits from.

2022

Solution Architect / Sales Engineer

June 2022 - June 2024

▹Learned to translate years of SRE expertise into stories that actually resonated with customers. Designing observability strategies that gave teams visibility they didn't know they were missing.
▹Had real talk with DevOps and SRE teams about the pain of alert fatigue, mystery latency spikes, and "works on my machine" incidents. Then showed them how to fix it.
▹Became the technical voice in the room for everything from hands-on-keyboard demos to C-level conversations about why reliability isn't just an ops problem.
▹Hit the conference circuit talking about modern observability. Not death by PowerPoint, but actual war stories about SLOs, error budgets, and progressive delivery.
▹Worked closely with Sales, Product, and Engineering to shape what we built and how we sold it. Because the best products come from listening to what breaks in production.
▹Helped customers find and fix performance issues before they became incidents. The kind of proactive work that makes you look like a wizard but is really just good instrumentation.

2021

Senior Site Reliability Engineer

July 2021 - June 2022

▹Ran DocuSign AI infrastructure across GCP and AWS. Full Terraform and Ansible, because clicking through consoles is how you get configuration drift and sadness.
▹Built our observability stack with Prometheus, Grafana, Loki, and Tempo. Embracing the SRE mantra that if you can't measure it, you can't improve it (or debug it at 3am).
▹Owned our internal platform tooling: Vault for secrets, Consul for service mesh, Ansible Tower and Jenkins for automation. The unglamorous infrastructure that lets developers move fast.
▹Led the Kubernetes migration from Docker Swarm. Lots of whiteboarding with engineering teams, removing blockers, and figuring out the "how do we actually do this" details.
▹Served incident commander duty for production fires. Learned that good incident response is 20% technical skill and 80% clear communication and not panicking.

2019

Senior Site Reliability Engineer

May 2019 - June 2021

▹Managed 25+ Kubernetes clusters in production. Enough to get really good at patterns, automation, and not doing things manually because that way lies madness.
▹Went deep on AWS: VPC networking, IAM policies, Route53, S3, ECR, ELB, EC2, RDS, Lambda, EKS. Basically lived in the console until I could do most of it with Terraform in my sleep.
▹Built observability for seriously distributed systems using Elasticsearch and LogicMonitor. Learned to find needles in haystacks and turn logs into insights.
▹Worked on a global SRE team doing follow-the-sun coverage. Responding to incidents across time zones, including plenty of nights and weekends (character building!).
▹Lived the SRE lifestyle: writing SLOs, defending error budgets, running postmortems, and constantly asking "how do we make this more reliable without slowing down shipping?"

2018

System Platform Engineer / DevOps Engineer

April 2018 - April 2019

▹Jumped into DevOps at a payments giant. This is where I first experienced the shift from "submit a ticket and wait" to "infrastructure as code and ship it."
▹Learned what enterprise scale really means: strict compliance, defense-in-depth security, and the kind of availability requirements where every nine counts.
▹Got comfortable working across teams and time zones. Turns out good documentation and async communication are superpowers.

2016

Infrastructure Technology Senior Analyst

May 2016 - April 2018

2007

IT System Administrator/IT Project Manager

March 2007 - May 2016

# competencies

Customer Success Engineering

OnboardingAdoptionValue realizationSuccess plans/QBRsExec stakeholder managementRenewals & expansions

Sales Engineering

Discovery & solution designTailored demos/PoCsROI/TCO & RFPsPilot de-risking

DevOps & SRE

SLIs/SLOsError budgetsIncident response & postmortemsRelease engineeringReliability & agility improvements

Cloud Platforms

AWSAzureGCPKubernetesServerlessIaC (Terraform)GitOps

Automation & Orchestration

Event-driven workflowsAPI/webhook integrationsConditional logicCI/CDGuardrails & reproducible deploymentsAnsibleChefRunbooksChatops

# contact

Let's Connect

Always open to discussing automation, cloud infrastructure, or how to turn complex challenges into scalable solutions.

hello@aron.day

dayaron

GitHub

aronday

Racing

#95 Global Light

Location

Dublin, Ireland