KeY2Moon Solutions logo

Why KeY2Moon

Services

Solutions

Blogs

Careers

Company

← Back to Opportunities

Cloud and DevOps

Platform Reliability Engineer

Remote - US

Remote

Full-time

Senior

Posted February 2, 2026

About this Position

Build client-critical software at KeY2Moon

At KeY2Moon Solutions, you will work on real client problems that affect revenue, operations, and customer experience. We combine agency speed with engineering discipline, so people who join us get broad ownership and measurable impact.

Direct exposure to product, architecture, and client decision-making

A digital subscription business is scaling quickly, but release windows trigger recurring incidents and rollback-heavy weekends for the internal team.

Their current pipeline was assembled in phases and lacks guardrails. We need a pragmatic engineer who can improve reliability without freezing product delivery.

You will redesign delivery controls, observability, and incident workflows so teams can ship often without breaking production.

Engagement Stack

AWS

Kubernetes

Terraform

GitHub Actions

Datadog

Responsibilities

Rework release flow using GitHub Actions, Terraform, and Kubernetes rollout controls that match real failure patterns

Improve incident readiness through better service ownership, Datadog/Sentry observability, and runbook quality

Set practical reliability KPIs from AWS infrastructure, deployment, and error telemetry that engineering and product can track together

Coach client squads on operational discipline, on-call readiness, and post-incident follow-through

Requirements

You have improved unstable pipelines in high-pressure environments using AWS, Kubernetes, and Infrastructure as Code

You can define reliability controls that teams adopt because they are practical for daily delivery, not just policy-compliant

You are strong at production troubleshooting across infra, application, and CI/CD layers with clear incident communication

You can convert repetitive outage patterns into preventive engineering backlog with measurable reliability outcomes

Nice to have

Experience in subscription or payment-heavy systems where uptime directly affects revenue

Experience running blameless postmortems with cross-functional technical and business teams

Experience mentoring product engineers in reliability fundamentals and release safety practices

Hiring process

1. Intro call with talent team (30 minutes)

2. Practical role interview focused on recent project work (60-90 minutes)

3. Final panel on collaboration, ownership, and client communication

4. Offer discussion and onboarding plan

Apply for this job

Submit your details and our hiring team will review your profile.

Resume / CV*

Additional files (optional)

I agree to KeY2Moon Solutions processing my data for recruitment purposes.

Hire the best developers and designers around!

light1light2
KeY2Moon Solutions logo

KeY2Moon Solutions

Helping businesses grow with modern web, mobile, and cloud solutions. Your go-to partner for building scalable digital products.

Google PageSpeed

Contact us

Ready to transform your digital presence? Let's discuss your next project.

support@key2moon.com

+1 (214) 699-6387
+63 (997) 340-3269

© 2026 Copyright by KeY2Moon Solutions. All rights reserved.

This site is protected by Google Privacy Policy and Terms of Service apply.