Dear applicants, please keep in mind that applications without provided salary expectations and active LN profile will not be considered.

Hope for your understanding.

Location: Remote (LATAM)
Employment Type: Full-Time
Experience: 3+ years
Visa: Not applicable

A pre-seed AI startup building high-performance agent systems for the global entertainment and creator economy. The platform is already in private beta with major agencies and entertainment organizations. The company is early-stage, high-velocity, and deeply technical, with strong distribution advantages and early traction. The engineering culture is focused on shipping production-grade AI systems — not demos.

This role is strictly focused on agent reliability. We are not hiring a generalist full-stack engineer. The mission is to turn multi-agent systems from “works in demo” into “survives non-deterministic production traffic.” You will own the reliability layer end-to-end.

We are hiring someone who lives inside:

Evaluation pipelines
Observability systems
LLM failure modes
Agent reliability
Production traffic debugging

What You’ll Own

Instrument agents with trace-level observability (Langfuse or equivalent)
Build evaluation datasets from production traffic
Design and operate scoring pipelines (quality + robustness scoring)
Stand up and operate Braintrust or equivalent evaluation framework
Build feedback loops from eval → prompt → architecture → re-evaluation
Implement DSPy / DPO optimization as system matures
Partner with technical leadership on reliability-sensitive architecture decisions
Own dev → test → deploy → on-call sustain for your workstream

Must-Have Requirements

Real production experience with non-deterministic LLM systems (NOT side projects)
Experience building or operating evaluation pipelines
Strong Python backend (FastAPI / Django / Celery)
Observability experience (Langfuse / Grafana / Loki or equivalent)
Experience owning systems end-to-end
Async communication skills + fluent English
Based in LATAM

Nice-to-Have

DSPy / DPO experience
LangGraph or agentic workflow systems
Multi-model production stacks (GPT-4.x / Claude / Whisper)
Experience at high-signal LATAM companies or YC-backed startups

Applied AI Engineer (Agent Reliability) – LATAM (Remote)

More jobs

Software Engineer IV - Load and Performance SDET

IDEMIA North America

Senior Business Developer

Wipfli