HyrEzy Talent Solutions logo

AI Data Engineer

HyrEzy Talent Solutions
1 day ago
Full-time
On-site
Telangana, India

Role: AI Data Engineer

  • Location: Rai Durg, Hyderabad
  • Work mode- Hybrid model working (3 days work from office)
  • Experience: 5 - 8 Years (Minimum 5 years- AI Data Engineer)
  • Mandatory Skills: DVC (Data Version Control) and Airflow, Apache Spark, Flink, and Kafka,  Advanced level Python and AI logic and Rust (or C++), Vector Database Mastery like configuration of HNSW indexes, scalar quantization, and metadata filtering strategies
  • Budget: 18 - 32 LPA
  • Qualification: Bachelor of Engineering - Bachelor of Technology (B.E./B.Tech.)
  • Notice period: Immediate / early joiners (Max. 15-30 days)
  • Interview Process: 2 - 3 Technical rounds

Important Note:

  • We are currently prioritizing immediate / early joiners (maximum 15-30 days- notice period above 30 days will be automatically rejected.).
  • All mandatory technical skills must be clearly highlighted within the project descriptions in your resume, not just listed in the Skills or Roles & Responsibilities sections .

Position Overview:

We are seeking a hardcore, hands-on AI Data Engineer to build the high-performance data infrastructure required to power autonomous AI agents. You won't just be moving data from A to B; you will be architecting Dynamic Context Windows, managing Real-time Semantic Indexes, and building Self-Cleaning Data Pipelines that feed our "Super Employee" agents.

Key Responsibilities:

  • Vector & Graph ETL: Design and maintain pipelines that transform unstructured data (PDFs, emails, logs, chats) into optimized embeddings for Vector Databases (Pinecone, Weaviate, Milvus).
  • Semantic Data Modeling: Engineer data structures that optimize for Retrieval-Augmented Generation (RAG), ensuring agents find the "needle in the haystack" in milliseconds.
  • Knowledge Graph Construction: Build and scale Knowledge Graphs (Neo4j) to represent complex relationships in our trading and support data that standard vector search misses.
  • Automated Data Labeling & Synthetic Data: Implement pipelines using LLMs to auto-label datasets or generate synthetic edge cases for agent training and evaluation.
  • Stream Processing for Agents: Build real-time data "listeners" (Kafka/Flink) that feed live context to agents, allowing them to react to market or support events as they happen.
  • Data Reliability & "Drift" Detection: Build monitoring for "Embedding Drift", identifying when the statistical distribution of your data changes and the agent's "knowledge" becomes stale.  

Qualifications:

  • Vector Database Mastery: Expert-level configuration of HNSW indexes, scalar quantization, and metadata filtering strategies within Pinecone, Milvus, or Qdrant.
  • Advanced Python & Rust: Proficiency in Python for AI logic and Rust (or C++) for high-performance data processing and custom embedding functions.
  • Big Data Ecosystem: Hands-on experience with Apache Spark, Flink, and Kafka in a high-throughput environment (Trading/FinTech preferred).
  • LLM Data Tooling: Deep experience with Unstructured.io, LlamaIndex, or LangChain for document parsing and chunking strategy optimization.
  • MLOps & DataOps: Mastery of DVC (Data Version Control) and Airflow/Prefect for managing complex, non-linear AI data workflows.
  • Embedding Models: Understanding of how to fine-tune embedding models (e.g., BGE, Cohere, or OpenAI) to better represent domain-specific (Trading) terminology.

Additional qualifications:

  • Chunking Strategy Architect: You don't just "split text." You implement Semantic Chunking and Parent-Child retrieval strategies to maximize LLM context relevance.
  • Cold/Warm/Hot Storage Strategy: Managing cost and latency by tiering data between Vector DBs (Hot), SQL/NoSQL (Warm), and S3/Data Lakes (Cold).
  • Privacy & Redaction Pipelines: Building automated PII (Personally Identifiable Information) redaction into the ingestion layer to ensure agents never "see" or "leak" sensitive user data.

Why Join ?

  • Opportunity to lead transformative initiatives, modernizing legacy systems and shaping the future of trading technology.
  • Work with cutting-edge technologies in a dynamic, fast-paced environment.
  • Competitive compensation, professional growth opportunities, and the chance to work with industry-leading experts.