Principal Product Manager - Platform and ML Infra (AI/ML), Annapurna Labs
Company: Amazon
Location: Cupertino
Posted on: April 4, 2026
|
|
|
Job Description:
AWS Trainium is deployed at scale, with millions of chips in
production, and has been used for training and inference of
frontier models. AWS Neuron is the software stack for Trainium,
enabling customers to run deep learning and generative AI workloads
with optimal performance and cost efficiency. AWS Neuron is hiring
a Technical Product Manager to work backward from Trainium
customers and drive the developer experience for running
high-performance ML workloads at scale on AWS Trainium, from
getting started with Neuron Deep Learning Containers, AMIs, and AWS
services to operating at scale through orchestration, resiliency,
and observability. You will drive the product strategy for how
developers interact with Trainium through container ecosystems,
resource management platforms, and AWS services. This includes
Neuron integration with orchestration tools (SLURM, Kubernetes),
AWS services (EKS, SageMaker), Neuron Deep Learning Containers and
AMIs, and Linux distribution support. You will also drive the
strategy for resiliency and observability tools that enable system
diagnostics, performance monitoring, health monitoring, automated
recovery, and telemetry, allowing customers to operate AI training
and inference workloads with maximum uptime and efficiency, as well
as how Neuron Runtime System interacts with ML frameworks to ensure
scale and high performance execution of models. To be successful in
this role, you will partner with engineering teams and PMs
responsible for training, inference, and performance tools,
Marketing, Business Development, and Solution Architects supporting
customers, and develop deep knowledge and understanding of Trainium
Architecture and Neuron Runtime System (including Neuron Runtime
Library, Neuron Kernel Driver and Collective Communication Stack)
to effectively define product strategy and make informed technical
decisions. The Ideal Candidate The ideal candidate can balance
competing customer priorities and drive alignment across
engineering and business stakeholders in a fast-moving, early-stage
product environment, with excellent written and verbal
communication abilities. The ideal candidate has experience with: *
Technical product management for developer-facing runtime and
infrastructure products * Developer tools (SDKs, libraries, APIs)
with focus on developer experience * Resource management and
orchestration systems (SLURM, Kubernetes schedulers) * ML
monitoring, observability, and resilience * Distributed systems and
high-performance computing (HPC) environments * AWS cloud services
and infrastructure About AWS Neuron AWS Neuron is the software
stack for running deep learning and generative AI workloads on AWS
Trainium and AWS Inferentia. It includes a compiler, runtime,
training and inference libraries, and developer tools for
monitoring, profiling, and debugging. Built on an open source
foundation, Neuron supports native PyTorch and JAX frameworks and
popular ML libraries without code modification. Neuron enables
rapid experimentation, distributed training across multiple chips
and nodes, and cost-optimized inference powered by optimized
kernels. For performance optimization, Neuron provides the Neuron
Kernel Interface (NKI) for direct hardware access and a suite of
profiling and debugging tools. Key job responsibilities Product
Strategy & Vision: Own product strategy and roadmap. Guide
trade-offs between performance, scalability, and developer
experience. Write PRFAQs and PRDs. Customer Discovery: Understand
deployment challenges, orchestration needs, and infrastructure pain
points. Represent customer needs in executive prioritization.
Technical Leadership: Drive alignment across Neuron components
(Runtime, Kernel Driver, Collective Communication, container
infrastructure) and AWS services. Partner with training, inference,
and performance PMs. Write user stories and define success metrics.
Impact: Enable customers (Anthropic, Databricks, AWS teams) to
deploy, monitor, and operate ML workloads at scale through
container orchestration, resource management, health monitoring,
and observability. About the team Our team is dedicated to
supporting new members. We have a broad mix of experience levels
and tenures, and we're building an environment that celebrates
knowledge-sharing and mentorship. Our senior members enjoy
one-on-one mentoring and thorough, but kind, code reviews. We care
about your career growth and strive to assign projects that help
our team members develop your engineering expertise so you feel
empowered to take on more complex tasks in the future. Diverse
Experiences AWS values diverse experiences. Even if you do not meet
all of the qualifications and skills listed in the job description,
we encourage candidates to apply. If your career is just starting,
hasn't followed a traditional path, or includes alternative
experiences, don't let it stop you from applying. Inclusive Team
Culture Here at AWS, it's in our nature to learn and be curious.
Our employee-led affinity groups foster a culture of inclusion that
empower us to be proud of our differences. Ongoing events and
learning experiences, including our Conversations on Race and
Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop
embracing our uniqueness. Work/Life Balance We value work-life
harmony. Achieving success at work should never come at the expense
of sacrifices at home, which is why we strive for flexibility as
part of our working culture. When we feel supported in the
workplace and at home, there's nothing we can't achieve in the
cloud. Mentorship & Career Growth We're continuously raising our
performance bar as we strive to become Earth's Best Employer.
That's why you'll find endless knowledge-sharing, mentorship and
other career-advancing resources here to help you develop into a
better-rounded professional. About Amazon Annapurna Labs Amazon
Annapurna Labs team (our organization within AWS UC) is responsible
for building innovation in silicon and software for our AWS
customers. We are at the forefront of innovation by combining cloud
scale with the world's most talented engineers. Our team covers
multiple disciplines including silicon engineering, hardware
design, software and operations. Because of our teams breadth of
talent, we have been able to improve AWS cloud infrastructure in
high-performance machine learning with AWS Neuron, Inferentia and
Trainium ML chips, in networking and security with products such as
AWS Nitro, Enhanced Network Adapter (ENA), and Elastic Fabric
Adapter (EFA), and in computing with AWS Graviton and F1 EC2
instances. About AWS Utility Computing (UC) AWS Utility Computing
(UC) provides product innovations that continue to set AWS's
services and features apart in the industry. As a member of the UC
organization, you'll support the development and management of
Compute, Database, Storage, Platform, and Productivity Apps
services in AWS, including support for customers who require
specialized security solutions for their cloud services.
Additionally, this role may involve exposure to and experience with
Amazon's growing suite of generative AI services and other cloud
computing offerings across the AWS portfolio. About AWS Amazon Web
Services (AWS) is the world's most comprehensive and broadly
adopted cloud platform. We pioneered cloud computing and never
stopped innovating — that's why customers from the most successful
startups to Global 500 companies trust our robust suite of products
and services to power their businesses. - Bachelor's degree -
Experience owning/driving roadmap strategy and definition -
Experience with feature delivery and tradeoffs of a product -
Experience technical product management - Experience working
directly with Engineers on product enhancements - Experience in
project management methodologies, business analysis, or process
improvement Amazon is an equal opportunity employer and does not
discriminate on the basis of protected veteran status, disability,
or other legally protected status. Los Angeles County applicants:
Job duties for this position include: work safely and cooperatively
with other employees, supervisors, and staff; adhere to standards
of excellence despite stressful conditions; communicate effectively
and respectfully with employees, supervisors, and staff to ensure
exceptional customer service; and follow all federal, state, and
local laws and Company policies. Criminal history may have a
direct, adverse, and negative relationship with some of the
material job duties of this position. These include the duties and
responsibilities listed above, as well as the abilities to adhere
to company policies, exercise sound judgment, effectively manage
stress and work safely and respectfully with others, exhibit
trustworthiness and professionalism, and safeguard business
operations and the Company’s reputation. Pursuant to the Los
Angeles County Fair Chance Ordinance, we will consider for
employment qualified applicants with arrest and conviction records.
Our inclusive culture empowers Amazonians to deliver the best
results for our customers. If you have a disability and need a
workplace accommodation or adjustment during the application and
hiring process, including support for the interview or onboarding
process, please visit
https://amazon.jobs/content/en/how-we-hire/accommodations for more
information. If the country/region you’re applying in isn’t listed,
please contact your Recruiting Partner. The base salary range for
this position is listed below. Your Amazon package will include
sign-on payments and restricted stock units (RSUs). Final
compensation will be determined based on factors including
experience, qualifications, and location. Amazon also offers
comprehensive benefits including health insurance (medical, dental,
vision, prescription, Basic Life & AD&D insurance and option
for Supplemental life plans, EAP, Mental Health Support, Medical
Advice Line, Flexible Spending Accounts, Adoption and Surrogacy
Reimbursement coverage), 401(k) matching, paid time off, and
parental leave. Learn more about our benefits at
https://amazon.jobs/en/benefits . USA, CA, Cupertino - 206,900.00 -
279,900.00 USD annually USA, WA, Seattle - 179,900.00 - 243,400.00
USD annually
Keywords: Amazon, Walnut Creek , Principal Product Manager - Platform and ML Infra (AI/ML), Annapurna Labs, IT / Software / Systems , Cupertino, California