Job title: Site Reliability Engineer (SRE)
Job type: Permanent
Emp type: Full-time
Salary type: Annual
Salary: negotiable
Job published: 2025-06-03
Job ID: 96431
Contact name: Febrianto F
Phone number: +6567182347
Contact email: febrianto@linksinternational.com

Job Description

Site Reliability Engineer (SRE)

Location: On-site – Singapore
Industry: AI Infrastructure / Deep Tech
Represented by: Our client, a scaling AI technology company
Language Requirement: Bilingual – English and Chinese (Mandatory)

About the Opportunity

We are representing a forward-leaning AI infrastructure company developing scalable, high-performance platforms for intelligent systems. Their products serve real-time AI applications across enterprise and commercial markets.

As part of their continued growth, they are seeking a Site Reliability Engineer with proven experience in AI development environments, Agentic AI frameworks, or AI labs within tech or commercial companies.

Role Summary

This is a hands-on, on-site role where you’ll maintain and scale production-grade infrastructure, support real-time deployment of intelligent services, and ensure platform stability under high loads.

Key Responsibilities

  • Operate and scale container-based infrastructure (Kubernetes/Docker) in production.
  • Maintain CI/CD pipelines using GitLab CI, ArgoCD.
  • Implement observability tools: logging, monitoring, and alerting systems.
  • Automate operational tasks with Shell/Python scripts.
  • Troubleshoot critical incidents, perform root cause analysis, and implement recovery strategies.
  • Work with engineering teams to embed infrastructure-as-code and reliability best practices.
  • Participate in 24/7 on-call rotations for system support.

Requirements

  • Minimum 5 years of experience in SRE, DevOps, or platform engineering.
  • Proven background in managing infrastructure for AI-driven systems or agent-based platforms.
  • Strong hands-on experience with Kubernetes, Docker, and major cloud providers (AWS/GCP/Azure).
  • Proficiency in Linux systems and scripting languages (Shell, Python).
  • Deep understanding of services like Nginx, Redis, Kafka, ElasticSearch, MySQL.
  • Fluent in both English and Chinese (written and spoken) – required for team collaboration and system documentation.
  • Must be willing to work on-site in Singapore.

What’s Offered

  • High-impact engineering role with direct access to core AI infrastructure.
  • Collaborate in a multilingual, technically elite team.
  • Competitive salary, performance-based incentives.
  • Opportunity to shape platform reliability for next-gen AI systems