Ksa logo

C++Developer || AI || HPC || Bangalore

Ksa
21 days ago
Full-time
On-site
Bengaluru, Karnataka, India

We are seeking an experienced C++ AI Inference Engineer to design, optimize, and deploy high-performance AI inference engines using modern C++ and processor-specific optimizations. You will collaborate with research teams to productionize cutting-edge AI model architectures for CPU-based inference.

Key Responsibilities

  • Collaborate with research teams to understand AI model architectures and requirements

  • Design and implement AI model inference pipelines using C++17/20 and SIMD intrinsics (AVX2/AVX-512)

  • Optimize cache hierarchyNUMA-aware memory allocation, and matrix multiplication (GEMM) kernels

  • Develop operator fusion techniques and CPU inference engines for production workloads

  • Write production-grade, thread-safe C++ code with comprehensive unit testing

  • Profile and debug performance using Linux tools (perf, VTune, flamegraphs)

  • Conduct code reviews and ensure compliance with coding standards

  • Stay current with HPC, OpenMP, and modern C++ best practices

Required Technical Skills

Core Requirements:

  • Modern C++ (C++17/20) with smart pointers, coroutines, and concepts

  • SIMD intrinsics - AVX2 required, AVX-512 strongly preferred

  • Cache optimization - L1/L2/L3 prefetching and locality awareness

  • NUMA-aware programming for multi-socket systems

  • GEMM/blocked matrix multiplication kernel implementation

  • OpenMP 5.0+ for parallel computing

  • Linux performance profiling (perf, valgrind, sanitizers)

Strongly Desired:

  • High-performance AI inference engine development

  • Operator fusion and kernel fusion techniques

  • HPC (High-Performance Computing) experience

  • Memory management and allocation optimization

Qualifications

  • Bachelor's/Master's in Computer Science, Electrical Engineering, or related field

  • 3-7+ years proven C++ development experience

  • Linux/Unix expertise with strong debugging skills

  • Familiarity with Linear Algebra, numerical methods, and performance analysis

  • Experience with multi-threading, concurrency, and memory management

  • Strong problem-solving and analytical abilities

Preferred Qualifications

  • Knowledge of PyTorch/TensorFlow C++ backends

  • Real-time systems or embedded systems background

  • ARM SVE, RISC-V vector extensions, or Intel ISPC experience

What You Will Work On

  • Production-grade AI inference libraries powering LLMs and vision models

  • CPU-optimized inference pipelines for sub-millisecond latency

  • Cross-platform deployment across Intel Xeon, AMD EPYC, and ARM architectures

  • Performance optimizations reducing inference costs by 3-5x