Careers · AI Lab
Senior ML Engineer
We are looking for:
A Senior ML Engineer who has already built LLM agents and RAG systems in production and can not only “assemble a pipeline,” but also improve quality through hypotheses and experimentation. We need someone with a research-minded engineering mentality: quickly test ideas, turn them into working prototypes, measure results, and then drive them to stable production solutions. Independence and ownership over outcomes are crucial: “found a lever → proved it with metrics → deployed → monitor.”
In practice, this means you’ll be doing things like:
- Designing quality evaluation for agents: defining metrics (task success, tool success, latency/cost, hallucination rate), building datasets, running offline/online evals, and setting up regressions and alerts;
- Determining whether a code agent can outperform the current tool-based agent: comparing approaches (tool-based vs. code-execution/codegen), defining “better,” running A/B tests or controlled rollouts, and evaluating quality/cost/risk;
- Improving RAG quality by 30%: enhancing retrieval (chunking, query rewriting, hybrid search), reranking, context composition, dedup/anti-leak, grounding - and then proving gains on benchmarks and production metrics.
What matters to us:
- You’re not afraid of ambiguous problems, where there’s no ready-made solution and you must define what to measure, how to test, and what success means.
- You can balance speed and quality: experiment quickly while keeping reliability, observability, and reproducibility in mind.
- You can write (and vibe-code) production-friendly code - the kind that doesn’t make product engineers reach for their revolver.
Requirements:
- 5+ years of overall software engineering experience;
- 2+ years of experience building products around LLMs / agents;
- You have built at least:
- One RAG / search system with a full pipeline: retrieval → rerank → generation;
- One agent (tool-use / multi-step / workflows);
- These systems have real users;
- You are comfortable with:
- Asynchronous Python: asyncio, threads;
- LLM prompting: system/user prompts, few-shot, templates, context, instructions;
- Modern LLM internals: transformers, training, inference, serving;
- Recent models and their differences (quality / speed / context length / cost / multimodality, etc.);
- MCP (Model Context Protocol);
- ML methodology: train/val/test, metrics, basic evaluation principles;
- Quality control for agent responses:
- Monitoring, metrics, guardrails, regressions, alerts, human labeling / feedback loops;
- Frameworks and approaches for agents:
- fastmcp, mcp-use, OpenAI Agents SDK and equivalents;
- Tokenization:
- How tokenization works;
- Modern tokenizers, their impact on context length / cost / limits;
- RAG pipelines:
- Components (ingest / chunking / embeddings / vector store / retrieval / rerank / context composition / generation);
- Typical issues and solutions (hallucinations, poor retrieval, degradation, cold start, data drift, duplicates, latency);
- Cursor and similar tools (Claude Code, Codex, Aider, etc.):
- How to use code agents effectively in development.
Nice-to-haves:
- Experience designing APIs: REST / gRPC / GraphQL;
- Understanding of the HTTP protocol;
- Experience with relational databases: PostgreSQL and similar;
- Knowledge of distributed and vector stores: Weaviate, Cassandra, etc.;
- Experience with Python API frameworks: FastAPI, Flask, and equivalents;
- Familiarity with background task systems: Celery, Taskiq, Airflow, etc.;
- Containerization skills: Docker, Kubernetes, or Nomad;
- Experience working with queues and brokers: Kafka, RabbitMQ, Redis, etc.
We offer:
- Flexible schedule - you choose when to start your day;
- Relocation to Bilbao, Spain for you and your family, with full support at every stage (documents, housing, adaptation, even pets);
- Unlimited vacation - take time off when you really need it;
- A culture of trust and respect for professionals, with no micromanagement;
- A modern office in a cozy district, regular team events and off-sites;
- Growth and participation in architectural decisions, challenging tasks, and a strong team you can learn from.
Apply for this role
Send us your details and we'll get back to you.