P

Agentic Ai Application Tester

PM Consulting
1 day ago
Full-time
On-site
Taguig, Philippines

ROLE SUMMARY
We are looking for an Agentic AI Application Tester to lead quality assurance and testing for AI-powered workflow systems. This role bridges software quality engineering and AI evaluation — you will need both technical testing proficiency and a practical understanding of how agent-based, LLM-driven systems behave across the full lifecycle, from development through live production.

You will define evaluation strategies, build and maintain automated test infrastructure, monitor production health, and collaborate with engineering teams to continuously improve the safety, reliability, and performance of AI-enabled applications.

KEY ACCOUNTABILITIES

• Own the definition and execution of testing and quality assurance strategies for AI-enabled workflows
• Continuously evaluate and monitor system behavior in live production environments
• Support auditability, risk management, and ongoing quality improvement initiatives across the organization

PRINCIPAL RESPONSIBILITIES

Quality Strategy & Evaluation Design

• Set quality criteria and testing strategies for agentic workflows, spanning accuracy, response latency, safety, compliance, and operational risk
• Design and build automated evaluation harnesses to measure agent performance across key dimensions, including output hallucination rates, tool misuse, policy adherence, and task completion rates
• Apply LLM evaluation frameworks to track output quality, detect regressions, and identify system drift over time

Test Automation & Infrastructure
• Build and maintain automated test suites covering UI, API, and end-to-end workflow validation
• Develop custom evaluation scripts and tooling to support continuous quality assessment of agent behaviors
• Create and maintain runbooks for recurring failure modes and actively contribute to incident response processes

Production Monitoring & Reporting
• Implement and manage continuous monitoring systems to identify anomalies, quality degradation, and emerging safety issues in production
• Design and maintain dashboards and quality reports that surface meaningful metrics and trends for engineering teams and business stakeholders

Collaboration & Compliance
• Partner with developers to iterate on prompts, tool configurations, and workflow designs based on test findings and quality data
• Ensure all testing, logging, and monitoring activities meet data privacy, audit, and applicable regulatory standards

QUALIFICATIONS
Essential Experience

• Minimum 3 years of experience in QA, test automation, or DevOps — or at least 2 years with direct, hands-on experience testing AI or ML-enabled systems
• Solid Python programming skills applied to test automation, evaluation harness development, and data analysis
• Strong attention to detail, with a focus on identifying issues that materially affect system reliability and user trust
• Ability to work effectively in an environment where tools, frameworks, and best practices are actively evolving
• Collaborative approach, with a track record of using evidence-based findings to inform product and engineering decisions

Required Technical Skills
• Programming: Python — used for test automation, evaluation harness development, and data analysis tasks
• UI Automation: Playwright — for end-to-end testing of agent-driven workflow interfaces
• AI Evaluation: hands-on experience with LLM evaluation frameworks covering quality assessment, drift detection, and regression analysis
• Workflow Testing: API and agent workflow validation using custom-built scripts
• Monitoring: production-level quality monitoring and anomaly detection implementation

Desirable Skills
• Familiarity with Python testing frameworks such as Pytest or equivalent
• SQL skills for querying logs, metrics, or evaluation datasets
• Experience with observability and monitoring tools such as Prometheus, Grafana, or comparable platforms
• Understanding of hallucination detection techniques and AI safety design principles
• Exposure to CI/CD pipelines and Git-based development workflows