About the role
As a Site Reliability Architect (SRE), you will make an impact by building and leading a modern, AI?enabled SRE organization that improves the availability, performance, and resilience of large?scale retail and supply chain platforms. You will be a valued member of a global engineering leadership team and work collaboratively with product, infrastructure, cloud, and business stakeholders to drive a transition from legacy operations to an SLO?driven reliability culture.
**In this role, you will:**
Build and scale an enterprise SRE function from the ground up, defining standards, operating models, and career paths.
Own availability, latency, and performance for a complex omnichannel ecosystem, including high?traffic web applications, APIs, and GraphQL layers.
Define and execute a multi?year SRE strategy, transitioning legacy environments to modern, automation?first and SLO?based practices.
Lead infrastructure reliability across hybrid environments, bridging cloud?native platforms with on?prem retail store systems and thin/thick client architectures.
Design and operate scalable event?driven architectures, including high?throughput Kafka platforms supporting global inventory and POS systems.
Standardize enterprise observability using tools such as Dynatrace, New Relic, and Google Cloud Monitoring to enable proactive issue detection and faster incident resolution.
Architect and deploy AI?enhanced operations, leveraging LLMs, AI agents, and MCP?based workflows to automate root cause analysis, reduce toil, and enable self?healing systems.
Partner with engineering, vendors, and external partners to align reliability goals with overall business outcomes.
**Work model**
We strive to provide flexibility wherever possible. Based on this role's business requirements, this is a remote position open to qualified applicants in the United States. Regardless of your working arrangement, we are here to support a healthy work?life balance through our wellbeing programs.
The working arrangements for this role are accurate as of the date of posting. This may change based on project, business, or client requirements. We will always be clear about role expectations.
**What you need to have to be considered**
12+ years of progressive experience across Site Reliability Engineering, DevOps, infrastructure, or platform engineering in large, distributed environments.
Demonstrated experience building and leading SRE organizations within complex enterprise or global environments.
Deep hands?on experience with cloud platforms (GCP preferred) or multi?cloud environments, including Kubernetes (GKE/EKS) and Infrastructure as Code (Terraform).
Strong knowledge of modern microservices and middleware technologies, including Kafka and GraphQL, operating at scale.
Proven ability to think strategically while operating hands?on, influencing cross?functional teams and senior stakeholders.
Experience managing vendors, partner teams, and third?party solutions within a broader product or platform portfolio.
Ability to translate complex technical concepts into clear business value for both engineering and non?technical stakeholders.
**These will help you stand out**
Experience supporting large?scale retail, e?commerce, or supply chain platforms with hybrid (cloud + on?prem) architectures.
Hands?on experience applying LLMs, AI agents, or automation frameworks to improve incident management and predictive maintenance.
Deep understanding of retail store networking, local hardware constraints, and thin/thick client models.
Successful track record driving cultural change toward reliability engineering, automation, and SLO?based operations.
Strong leadership presence with the ability to mentor senior engineers and develop high?performing global teams.
Please note: This role will require an in-person meet and greet at our Cognizant offices or client location.
Bachelor's degree in computer science, IT or equivalent
Applications will be accepted until April 21st, 2026.
Salary and Other Compensation:
The annual salary for this position is between $89,100 to $141,500 depending on experience and other qualifications of the successful candidate.
This position is also eligible for Cognizant's discretionary annual incentive program, based on performance and subject to the terms of Cognizant's applicable plans.
Benefits: Cognizant offers the following benefits for this position, subject to applicable eligibility requirements:
· Medical/Dental/Vision/Life Insurance
· Paid holidays plus Paid Time Off
· 401(k) plan and contributions
· Long-term/Short-term Disability
· Paid Parental Leave
· Employee Stock Purchase Plan
Disclaimer: The salary, other compensation, and benefits information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.
Cognizant is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected Veteran status, age, or any other characteristic protected by law.
Job #NLX290622870