This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
This role will be responsible for designing, deploying, and maintaining high-performance computing environments optimized for AI and machine learning workloads. The role involves building scalable infrastructure, ensuring efficient workload management, providing self-service and on-demand tooling, and collaborating with teams to support AI-driven applications. This role will drive operational excellence, and work with diverse hardware and software solutions to enhance performance and reliability of our on-premises AI/ML infrastructure.
Job Responsibility:
Technical System Expertise: Understands system protocols, how systems operate and data flows
Technical Engineering Services: Drives engineering projects by active contribution to the application of engineering techniques
Innovation: Contributes to designs to implement new ideas which improve an existing and new system/process/service
Technical Writing: Writes basic documentation on how technology works
Technical Leadership: Collaborates with technical teams and utilizes system expertise to deliver technical solutions
Technology Strategy: Contributes to new and existing technology options that support business goals
Requirements:
5+ years technical engineering experience, preferably in multiple technology focus areas
Expert understanding of AI/ML infrastructure components, or GPU-based systems – preferably in a high-availability, large scale environment
Hands-on Experience with NVIDIA DGX servers, BasePOD architectures, and advanced GPU technologies
Proficient in Linux/UNIX environments, including scripting/automation tools (Bash, Python, Ansible, Terraform)
Understanding of AI infrastructure security best practices
Experience with container orchestration (Kubernetes, Docker) and GPU workload management tools
Strong knowledge of networking (InfiniBand/Ethernet) and storage solutions in AI/ML contexts
Nice to have:
Understanding of CI/CD pipelines using tools such as Git, Artifactory, Jenkins, etc.
Experience with AI/ML pipelines (PyTorch, TensorFlow, RAPIDS AI, or other deep learning frameworks)
Experience with configuring and using monitoring tools (e.g., Prometheus, Grafana, NVIDIA DGCM)
Welcome to
CrawlJobs.com
– Your Global Job Discovery Platform
At CrawlJobs.com, we simplify finding your next career opportunity by bringing job listings directly to you from all corners of the web. Using cutting-edge AI and web-crawling technologies, we gather and curate job offers from various sources across the globe, ensuring you have access to the most up-to-date job listings in one place.
We use cookies to enhance your experience, analyze traffic, and serve personalized content. By clicking “Accept”, you agree to the use of cookies.