We are building the next-generation Agent Brain.

This isn't just another "wrapper around LLM APIs with a chat UI." It is a purpose-built Agent architecture designed for complex knowledge work. Our target scenarios focus on high-value domains like IP and R&D: these involve long-chain tasks, heterogeneous data sources, and strict evidence requirements, ultimately demanding verifiable outcomes rather than just conversational outputs.

You will be building a reusable Agent infrastructure and a reasoning/orchestration kernel. Your core focus will be solving three major challenges:

How the Agent Runs — Execution Engine: The mechanics of the agent loop. This includes multi-step reasoning cycles, middleware pipelines, Planning & SubAgent orchestration, Checkpointing & state recovery, and execution controls (Permissions / Cost / Clarification).
How the Model Thinks per Turn — Context Engineering: It’s not about stuffing the context window; it’s about organizing attention. You will solve core issues such as: input standardization, determining the exact capabilities and states exposed per turn, compressing long histories into an effective working memory, structured degradation under budget constraints, and normalizing tool outputs into traceable evidence.
What the Agent Can Use — Capability Foundation: Sandbox environments (Docker / K8s), Memory Store, MCP Hub, Skills Engine, File System & Upload Pipelines, multi-tenant isolation, security, and observability.

If you have a long-term passion for transforming raw model capabilities into robust systems capable of reliably executing complex tasks, this role is a perfect match.

Responsibilities:

Architecture Leadership & Evolution: Lead the architecture design of the next-generation Agent infrastructure geared towards complex knowledge work. Define the boundaries and synergy mechanisms of three core layers: Execution Engine, Reasoning/Routing, and Foundational Infrastructure. Ensure high system availability and long-term evolvability under strict low-hallucination tolerances.
Execution Engine Development: Design and implement the Agent Loop mechanism and middleware pipelines. Drive the execution and orchestration of Planning and SubAgents (task decomposition, concurrency control). Build mechanisms for checkpoints, interrupt recovery, failure self-healing, and permission/cost management to ensure the stable execution of long-chain, complex tasks.
Reasoning & Routing (Context Engineering): Tackle core routing logic. Design strategies for input standardization, dynamic capability views, and tiered budget governance (structured degradation). Build structured task workspaces to resolve issues with long-history compression, tool result normalization, and traceability, achieving highly efficient dynamic context organization.
Foundational Infrastructure: Lead the development of secure Sandbox isolation environments (Docker/K8s/AST), multi-tier Memory Stores, MCP (Model Context Protocol) Hubs, skill engines, and file workflow pipelines. Perfect multi-tenant isolation and the closed loop of system observability.
Team Management & Technical Enablement: Act as a technical pillar by writing hands-on code for core modules. Take charge of technical decomposition, code reviews, building automated evaluation loops (Eval System), and making key technical decisions. Drive the effective abstraction of business scenarios into foundational infrastructure capabilities.

Qualifications

Experience: 5+ years of software engineering experience, having led the design and implementation of complex systems (beyond simple CRUD or basic AI API wrappers). Proven Tech Lead / Staff experience, having managed an engineering team of at least 3 members.
Core Tech Stack: Solid Python engineering skills, capable of independently owning core modules. Proficient in general engineering architectures such as streaming output, asynchronous concurrency, and multi-model routing adaptation. Ability to balance system stability, security, cost, and latency.
AI/Agent Expertise: Deep hands-on engineering experience with Agent/LLM systems. Must possess profound expertise in at least two of the following three areas:
- Execution Engine: Multi-step reasoning loops, tool lifecycles, Planning/SubAgent orchestration, interrupt recovery, and self-healing.
- Reasoning & Routing: Input standardization, Context Assembly, budget governance, Provider Shaping, and task phase modeling.
- Foundational Infrastructure: Sandbox isolation, Memory/Retrieval, MCP, file systems, multi-tenant isolation, and security observability.
Portfolio (Hard Requirement): Must have a representative portfolio piece in AI Coding, Agents, or Developer Tools (shipped product, open-source project, or high-quality demo). Must prove hands-on coding of critical core modules and actual resolution of engineering challenges (involving real usage and performance/cost constraints).

Lead / Staff Engineer, AI Agent Platform

Responsibilities:

Qualifications

More jobs

Training Manager

Kempinski Hotels

Maintenance Supervisor, Orthopedics

Johnson & Johnson