Dear applicants, please keep in mind that applications without provided salary expectations and active LN profile will not be considered.
Hope for your understanding.
Location: Remote (LATAM)
Employment Type: Full-Time
Experience: 3+ years
Visa: Not applicable
A pre-seed AI startup building high-performance agent systems for the global entertainment and creator economy. The platform is already in private beta with major agencies and entertainment organizations. The company is early-stage, high-velocity, and deeply technical, with strong distribution advantages and early traction. The engineering culture is focused on shipping production-grade AI systems — not demos.
This role is strictly focused on agent reliability. We are not hiring a generalist full-stack engineer. The mission is to turn multi-agent systems from “works in demo” into “survives non-deterministic production traffic.” You will own the reliability layer end-to-end.
We are hiring someone who lives inside:
- Evaluation pipelines
- Observability systems
- LLM failure modes
- Agent reliability
- Production traffic debugging
What You’ll Own
- Instrument agents with trace-level observability (Langfuse or equivalent)
- Build evaluation datasets from production traffic
- Design and operate scoring pipelines (quality + robustness scoring)
- Stand up and operate Braintrust or equivalent evaluation framework
- Build feedback loops from eval → prompt → architecture → re-evaluation
- Implement DSPy / DPO optimization as system matures
- Partner with technical leadership on reliability-sensitive architecture decisions
- Own dev → test → deploy → on-call sustain for your workstream
Must-Have Requirements
- Real production experience with non-deterministic LLM systems (NOT side projects)
- Experience building or operating evaluation pipelines
- Strong Python backend (FastAPI / Django / Celery)
- Observability experience (Langfuse / Grafana / Loki or equivalent)
- Experience owning systems end-to-end
- Async communication skills + fluent English
- Based in LATAM
Nice-to-Have
- DSPy / DPO experience
- LangGraph or agentic workflow systems
- Multi-model production stacks (GPT-4.x / Claude / Whisper)
- Experience at high-signal LATAM companies or YC-backed startups