MLOps Engineer
Apply NowJob details
Job Description: Role: - ML Ops Engineer Location: - Remote (To work EST location) – Only Canada candidates Duration: - Long Term Must have: - Minimum Exp should be 6 years. MLOPS, DevOps, Infra, AI, LLM Our vision: - To inspire new possibilities for the health ecosystem with technology and human ingenuity. What is in it for you? We are looking for a DevOps & MLOps Engineer with 5 years of experience to architect, deploy, and optimize the infrastructure for our commercial Generative AI product. This role is ideal for someone who thrives at the intersection of DevOps, MLOps, and AI infrastructure, ensuring secure, scalable, and cost-efficient LLM deployments. You will work with cutting-edge technologies like LLMs, vector databases, Databricks, and GPU scaling, helping us fine-tune and deploy AI models at scale. Key Responsibilities 1.Cloud & Hybrid Infrastructure Management · Architect and maintain a secure, scalable cloud infrastructure on AWS (preferred), GCP, or hybrid-cloud setups. · Deploy GPU-accelerated compute clusters on AWS for cost-efficient model training and inference. · Implement best practices for VPC networking, IAM security, encryption, and access controls. 2. MLOps & Model Deployment · Build and maintain end-to-end MLOps pipelines for LLM training, fine-tuning, and inference. · Optimize GPU utilization, autoscaling, and resource allocation for large-scale LLM workloads. · Integrate Databricks & MLflow for scalable model training and tracking. · Deploy models with Torch Serve, Triton, vLLM, or Ray Serve for efficient inference. 3. CI/CD & Automation · Develop CI/CD pipelines for model versioning, API services, and infrastructure automation using Terraform and GitHub Actions. · Automate model deployment & rollback strategies for reliable AI system updates. 4.Observability, Performance Tuning & Cost Optimization Implement monitoring & logging tools (Prometheus, Grafana, CloudWatch) for LLM performance tracking . 5.Vector Databases & Retrieval-Augmented Generation (RAG) Deploy and optimize vector databases (Pinecone, FAISS, We aviate, Chroma DB) for RAG-based LLMs . Improve search and retrieval efficiency to enhance AI model responses. 6.Security & Compliance · Ensure secure AI model deployments with role-based access, encryption, and cloud security best practices . · Comply with GDPR, SOC 2, and enterprise AI security requirements . Required Qualifications · 5 years of experience in DevOps, MLOps, or AI Infrastructure Engineering . · Strong expertise in AWS (preferred), GCP, or hybrid cloud deployments . · Hands-on experience with deploying and scaling LLMs in production. · Proficiency in Databricks, MLflow, and Spark-based ML workflows . · Strong Kubernetes, Docker, Terraform, and CI/CD experience. · Experience with GPU scaling, model quantization, and inference acceleration . · Familiarity with LLM model serving (AWS Sage Maker, Bedrock) . · Expertise in vector databases (Pinecone, FAISS, we aviate, Chroma DB) for RAG workflows. · Solid understanding of network security, IAM, and encryption . Nice-to-Have Skills · Experience with multi-cloud deployments & on-prem AI infrastructure . · Familiarity with fine-tuning LLMs using LoRa, Deep Speed, or Hugging Face . · Exposure to AI cost optimization strategies (Spot Instances, Serverless AI, GPU scheduling) . · Knowledge of LLM observability tools (Why Labs, Arize AI, Lang Smith) . Recruiter Details: - Name Prashant Pal Email PrashantPvbeyond.com
Apply Now