Business Area:
EngineeringSeniority Level:
Mid-Senior levelJob Description:
Our Data Services Pillar is the heart of data innovation. We don’t just work with technology; we build it. Our mission is to empower data practitioners by creating seamless, enterprise-grade experiences for data engineering, warehousing, streaming, operational databases, and AI.
Join Cloudera’s Machine Learning Platform team as a Staff Software Engineer. You will be a core technical leader responsible for designing, building, and delivering our next-generation AI and MLOps platform. You will design, build, and deliver an enterprise-grade platform enabling enterprises to create, deploy, and orchestrate agentic applications and multi-agent platforms using foundation models with enterprise data at scale in a hybrid cloud environment.
As a Staff-level engineer, you will drive technical architecture, advocate for engineering best practices, and collaborate closely with cross-functional teams (Product, Design, Frontend, and Field Engineering) to enhance developer velocity and platform agility.
Our Core Tech Stack:
Backend: Python, gRPC, SQL
Infrastructure: Kubernetes, Knative, Keda, Docker, Hybrid Cloud (AWS, GCP, Azure, OpenShift)
GenAI & ML: LangChain, CrewAI, LlamaIndex, Closed and open source LLMs
Data & Vector DBs: Quadrant, Pinecone or Milvus, Redis, Postgres
In this role you will..
Architect and Build: Design, code, and implement elegant, scalable, enterprise-quality platform and application services / SDKs that support autonomous agents, multi-agent collaboration, and advanced tool-use capabilities.
Implement Agentic Evaluation & Observability: Build automated evaluation pipelines and trajectory tracking to continuously bench, audit, and evaluate agent reasoning loops, tool-calling accuracy, and guardrail compliance.
Architect Memory Management Systems: Develop and optimize robust stateful memory architectures for enterprise agents, handling short-term context window strategies, long-term semantic/episodic memory persistence, and secure cross-session state management.
Enable RAG use cases:
Lead by Example: Advocate for and establish engineering best practices, coding standards, and rigorous system design methodologies.
Enhance Platform Velocity: Work to enhance developer velocity, framework abstraction, and team agility across the AI platform ecosystem.
Collaborate Cross-Functionally: Build strong relationships and collaborate with platform and UI engineers, data scientists, quality engineers, UX designers, Product Management, Field Engineering, and external enterprise partners.
Cross-Functional Collaboration: Build strong technical relationships with platform engineers, UI developers, UX designers, and Product Management to deliver cohesive user experiences.
Mentorship: Act as a senior technical leader on the team, mentoring junior engineers and actively contributing to a culture of engineering excellence and craftsmanship.
What You Bring (Required Experience):
Experience: 8+ years of software engineering experience building scalable backend microservices and distributed systems.
Core Languages: Deep expertise in Python Go, Java, or C#/C++, along with gRPC and SQL.
Cloud & Containerization: Extensive hands-on experience designing and developing microservices on Kubernetes, plus expertise in at least one major cloud platform (AWS, GCP, or Microsoft Azure).
Agentic Frameworks & Evaluation: Proven experience with open-source agentic frameworks((e.g., LangGraph, AutoGen, CrewAI) and evaluation tooling (LangSmith/Langfuse or Phoenix).
Agentic & Generative AI Mastery: Proven experience building applications utilizing advanced LLM orchestration paradigms (e.g., ReAct loops, planning/reflection frameworks, multi-agent systems) alongside standard foundational models, prompt engineering, and RAG architectures using vector databases (e.g., Pinecone, Milvus).
Memory & State Infrastructure: Deep understanding of caching layers, relational databases, and vector databases optimized for agentic state persistence and memory retrieval.
System Design: Demonstrated ability to go deep into complex distributed systems, crafting both high-level architecture and low-level technical designs.
Education: BS/MS in Computer Science, Software Engineering, or a related field (or equivalent professional experience).
Soft Skills: Self-driven with a strong sense of ownership, paired with excellent written and verbal communication skills.
Bonus Points (Preferred Experience)
AI/ML Orchestration: Experience with ML orchestration and serving software ( KServe, Knative).
Big Data: Familiarity with distributed data technologies like Apache Spark, Hive, etc.
Full-Stack Exposure: Experience with React, HTML, and CSS to better collaborate with UI teams.
Data Science Ecosystem: Experience building applications alongside data scientists using tools like Python, TensorFlow, PyTorch, MLflow, or R.
Agile Environments: A proven track record of collaborating with agile teams across geographically dispersed and remote locations.
This role is not eligible for immigration sponsorship or relocation
What you can expect from us:
Generous PTO Policy
Support work life balance with Unplugged Days
Flexible WFH Policy
Mental & Physical Wellness programs
Phone and Internet Reimbursement program
Access to Continued Career Development
Comprehensive Benefits and Competitive Packages
Employee Resource Groups
EEO/VEVRAA
#LI-BV1
#LI-REMOTE