Senior AI Engineer, NeMo Retriever - Model Optimization and MLOps

Company: Quality Control Specialist - Pest Control
Location: Santa Clara
Posted on: June 2, 2025

Job Description:

Senior AI Engineer, NeMo Retriever - Model Optimization and MLOps page is loadedSenior AI Engineer, NeMo Retriever - Model Optimization and MLOpsApply locations US, CA, Santa Clara US, WA, Remote US, CA, Remote US, WA, Redmond US, WI, Remote time type Full time posted on Posted 2 Days Ago job requisition id JR1996904NVIDIA's technology is at the heart of the AI revolution, touching people across the planet by powering everything from self-driving cars and robotics to co-pilots and more. Join us at the forefront of technological advancement in intelligent assistants and information retrieval. NVIDIA NIM provides containers to self-host GPU-accelerated inferencing microservices for pre-trained and customized AI models across clouds, data centers, RTX AI PCs, and workstations. NIM microservices expose industry-standard APIs for simple integration into AI applications, development frameworks, and workflows. Built on pre-optimized inference engines from NVIDIA and the community, including NVIDIA TensorRT and TensorRT-LLM, NIM microservices optimize response latency and throughput for each combination of foundation model and GPU.
NVIDIA NeMo Retriever is a collection of NIMs for building multimodal extraction, re-ranking, and embedding pipelines with high accuracy and maximum data privacy. It delivers quick, context-aware responses for AI applications like advanced retrieval-augmented generation (RAG) and Agentic AI workflows.The NeMo Retriever team is looking for an AI Engineer to join our team, focusing on the intersection of machine learning development, performance optimization, and MLOps. This role requires a unique blend of technical expertise in ML model development, system optimization, and operational excellence. We are looking for someone with a passion for working with the world's most complicated problems in Generative AI, LLM, MLLM, and RAG spaces using our innovative hardware and software platforms. You will leverage and augment existing tools that enable building NIMs, which power flexible, multi-modal retrievers and agents. If you're creative & passionate about solving real-world conversational AI problems, come join us.What You'll Be Doing:

Develop and maintain NIMs that containerize optimized models using OpenAPI standards using Python or an equivalent performant language.
Work closely with partner teams to understand requirements, build & evaluate POCs, and develop roadmaps for production-level tools
Enable development of integrated systems - AI Blueprints that provide a unified, turnkey experience.
Help build and maintain our Continuous Delivery pipeline with the goal of moving changes to production faster and safer while ensuring key operational standards.
Provide peer reviews to other specialists, including feedback on performance, scalability, and correctness.What We Need To See:
- Bachelor's or Master's Degree program in Computer Science, Computer Engineering, or a related field (or equivalent experience).
- 8+ years of demonstrated experience in a similar or related role
- Python programming expertise with Deep Learning (DL) frameworks such as PyTorch.
- Experience delivering software in a cloud context and is familiar with the patterns and processes of handling cloud infrastructure
- Knowledge of MLOps technologies such as Docker-Compose, Containers, Kubernetes, Helm, data center deployments, etc.
- Familiarity with ML libraries, especially PyTorch, TensorRT, or TensorRT-LLM.
- Excellent in-depth hands-on understanding of NLP, LLM, MLLM, Generative AI , and RAG workflows
- Self-starter with a passion for growth, enthusiasm for continuous learning, and sharing findings across the team
- Extremely motivated, highly passionate, and curious about new technologies.With competitive salaries and a generous benefits package, NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. Due to unprecedented growth, our exclusive engineering teams are rapidly growing.The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits . NVIDIA accepts applications on an ongoing basis.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.Similar Jobs (5)Principal AI and ML Engineer - AI for Networkinglocations US, CA, Santa Clara time type Full time posted on Posted 3 Days AgoSenior Prediction and Planning Machine Learning Engineer - Autonomous Vehicleslocations 3 Locations time type Full time posted on Posted 10 Days AgoPrincipal Prediction and Planning Machine Learning Engineer - Autonomous Vehicleslocations US, CA, Santa Clara time type Full time posted on Posted 5 Days Ago
  #J-18808-Ljbffr

Keywords: Quality Control Specialist - Pest Control, Walnut Creek , Senior AI Engineer, NeMo Retriever - Model Optimization and MLOps, Engineering , Santa Clara, California

Click here to apply!

Didn't find what you're looking for? Search again!

Let Santa Clara recruiters find you. Post your resume for free!

Get Santa Clara Engineering jobs via email.

View more Walnut Creek Engineering jobs

Other Engineering Jobs

Senior Machine Learning Engineer, Frameworks
Description: Senior Machine Learning Engineer, FrameworksPalo Alto, CA / Ann Arbor, MI / Product Technology - AD/ADAS / Employee / hybridWoven by Toyota is the mobility technology subsidiary of Toyota Motor Corporation. (more...)
Company: Woven
Location: Palo Alto
Posted on: 06/4/2025

Netwerk Security Engineer
Description: De technische opslag of toegang is strikt noodzakelijk voor het legitieme doel om het gebruik van een specifieke uitdrukkelijk door de abonnee of gebruiker gevraagde dienst mogelijk te maken, of met (more...)
Company: Cistec
Location: Palo Alto
Posted on: 06/4/2025

L3 Network engineer- Hyderabad/Mumbai/Ahmedabad
Description: Required Skills:Routing, Switching, Wireless, FirewallsExperience:5 years of experience in network engineering, with a strong focus on wireless, switching, and firewalls, and at least 2-3 years in L3 (more...)
Company: Nipun Net Solutions Pvt Ltd
Location: Palo Alto
Posted on: 06/4/2025

Salary in Walnut Creek, California Area | More details for Walnut Creek, California Jobs |Salary

Sr. Energy Storage Networks Engineer - REMOTE
Description: Sr. Energy Storage Networks Engineer - REMOTESr. Energy Storage Network Engineer - RenewablesLocation: FULL-TIME REMOTE Anywhere in the USA This is an opportunity to join an industry-leading renewable (more...)
Company: ThinkBAC Consulting LLC
Location: Palo Alto
Posted on: 06/4/2025

Senior Network Engineer
Description: Our Brand : J.Jill is a national lifestyle brand that provides apparel, footwear and accessories designed to help its customers move through a full life with ease. The brand represents an easy, thoughtful, (more...)
Company: The J.Jill Group
Location: Palo Alto
Posted on: 06/4/2025

Senior C++ Engineer
Description: Started in 2021, Coram.AI is building the best business AI video system on the market. Powered by next-generation video artificial intelligence, we deliver unprecedented insights and 10x better user experience (more...)
Company: Coram AI co
Location: Palo Alto
Posted on: 06/4/2025

Senior Machine Learning Engineer, Perception
Description: is enabling Toyota's once-in-a-century transformation into a mobility company. Inspired by a legacy of innovating for the benefit of others, our mission is to challenge the current state of mobility through (more...)
Company: Woven by Toyota
Location: Palo Alto
Posted on: 06/4/2025

Internship, Fullstack Engineer, Tesla Bot Tooling (Summer 2025)
Description: Internship, Fullstack Engineer, Tesla Bot Tooling Summer 2025 This position is expected to start in May/June and continue through Aug/Sep. Internships are in-person for 40 hours a week for a minimum (more...)
Company: Tesla, Inc.
Location: Palo Alto
Posted on: 06/4/2025

Staff Application Platform Engineer
Description: Assured is on a mission to modernize insurance. Claims processing i.e. should we pay this claim , while often overlooked, is the foundation of the entire industry. It's currently highly manual, involving (more...)
Company: Assured
Location: Palo Alto
Posted on: 06/4/2025

Automation Engineer, UI Quality Assurance (Remote)
Description: Machinify is the leading provider of AI-powered software products that transform healthcare claims and payment operations. Each year, the healthcare industry generates over 200B in claims mispayments, (more...)
Company: Machinify Inc.
Location: Palo Alto
Posted on: 06/4/2025

Loading more jobs...

Senior AI Engineer, NeMo Retriever - Model Optimization and MLOps

Didn't find what you're looking for? Search again!

Other Engineering Jobs

Log In or Create An Account