AI Engineer (Computer Vision & Multimodal)

We’re seeking an AI Engineer with strong hands-on experience in computer vision and multimodal (vision + voice) systems—from data preparation and model training to scalable inference and production deployment. You’ll build and ship end-to-end AI solutions: curate datasets, fine-tune and evaluate models, optimize inference pipelines, and operationalize services on cloud infrastructure.

Job Requirements

Bachelor’s degree in Computer Engineering / Software Engineering / Electronics / Computer Science or equivalent (Master’s preferred).
0-3 years building production AI systems, with a focus on computer vision (PyTorch/TensorFlow).
Strong Python engineering skills; experience with REST/JSON APIs (FastAPI/Flask), and service design.
Hands-on with cloud deployment for AI (AWS/GCP/Azure), containers (Docker), orchestration (Kubernetes/Cloud Run/ECS), and GPUs.
Experience training and serving CV models (ResNet/EfficientNet, YOLO/RetinaNet, U-Net/nnU-Net, Vision Transformers), and familiarity with ASR/TTS pipelines (e.g., Whisper, torchaudio).
Practical knowledge of model optimization (ONNX/TensorRT, quantization), and data tooling (Pandas, NumPy, OpenCV, ffmpeg).
Proficiency with MLOps tooling: experiment tracking (MLflow/W&B), model registry, CI/CD, monitoring/logging (Prometheus/Grafana/Cloud Monitoring).
Strong grounding in evaluation methodology: metrics, ablation studies, error/bias analysis, and reproducible research practices.
Comfort with cloud storage, queues, and databases; experience integrating AI services into existing systems.
Computer vision: 2 years (Preferred)
Nice-to-Have:
- Experience with multimodal (vision + workflows; Hugging Face ecosystem.
- Knowledge of real-time/edge inference, TensorRT-LLM, vLLM.
- Background in signal processing or audio engineering; speaker diarization, voice cloning ethics.
- Experience in regulated domains with HITL workflows and documentation.

Main Job Duties

Model development and inference
- Implement, fine-tune, and evaluate CV models (classification, detection, segmentation, tracking) using PyTorch/TensorFlow.
- Build robust inference services (REST/JSON) with batching, streaming, and hardware acceleration (GPU, TensorRT/ONNX).
- Train and adapt voice/ASR/TTS models and integrate with vision pipelines (multimodal workflows).
Data and evaluation
- Own dataset lifecycle: collection, cleaning, labeling/specs, augmentation, and versioning.
- Define metrics and test sets; run offline/online evaluations (accuracy, latency, throughput, calibration) and error analysis.
- Develop data transformation and feature pipelines; maintain data quality checks and bias/fairness assessments.
Production engineering
- Containerize and deploy models to cloud (AWS/GCP/Azure) using Docker/Kubernetes/Cloud Run/ECS.
- Implement CI/CD, experiment tracking, model registry, A/B canaries, rollout/rollback strategies.
- Build monitoring for drift, performance, and cost; automate retraining or active learning loops.
Systems integration
- Design APIs and modules, integrate with upstream/downstream systems, and ensure reliable contracts and observability.
- Collaborate with product, design, and backend teams to turn requirements into measurable deliverables.
Performance and optimization
- Optimize training/inference (quantization, pruning, distillation, mixed precision); leverage ONNX/TensorRT/torch.compile.
- Profile and tune data loaders, GPU utilization, caching, and I/O.
Compliance and safety
- Implement data governance, privacy, and security best practices; maintain audit trails and documentation.
- Establish HITL workflows and guardrails for clinical or safety-critical contexts where applicable.

Your Next Step Starts Here

A space to grow, learn, and contribute to purposeful products.