Site Reliability Engineer-1

Bengaluru, Karnataka, India | Full-time


About MoEngage

MoEngage is an intelligent customer engagement platform, built for customer-obsessed marketers and product owners. We enable hyper-personalization at scale across multiple channels like mobile push, email, in-app, web push, on-site messages, and SMS. With AI-powered automation and optimization, brands can analyze audience behavior and engage consumers with personalized communication at every touchpoint across their lifecycle.

Fortune 500 brands and Enterprises across 35 countries such as Deutsche Telekom, Samsung, Ally Financial, Vodafone, and McAfee along with internet-first brands such as Flipkart, Ola, OYO, and Bigbasket use MoEngage to orchestrate their cross-channel campaigns and engage efficiently with their customers sending 80 billion messages to 900 million consumers every month.

Our vision is to build the world’s most trusted customer engagement platform for the mobile-first world.

We promise to care about your customers as much as you do. And that justifies our top ratings for service and support in Gartner Magic Quadrant, Gartner Peer Insights, and G2 Summer Reports. We have also been recognized as one of the 25 Highest Rated Private Cloud Computing Companies To Work For in a list released by Battery Ventures, a global investment firm based on the employee feedback on Glassdoor where employees reported the highest levels of satisfaction at work during the first six months of the pandemic.

Our last round of Series C1 funding of $32.5 million in July 2021 accelerates our vision to create impeccable customer experiences for our customers globally.  We have recently crossed 400+ headcount milestones and still growing.


As part of the Engineering team at MoEngage, here are some things you can expect:

  • Take ownership and be responsible for what you build - no micromanagement
  • Work with A players (some of the best talent in the country), and expedite your learning curve and career growth
  • Make in India and build for the world at the scale of 500M active users, which no other internet company in the country has seen
  • Learn together from different teams on how they scale to millions of users and billions of messages.


Here are some of the challenging areas you can expect to work as part of the SRE team :

  • Maintain services once they are live by measuring and monitoring availability, latency and overall system reliability.
  • Work closely with team members to ensure best practices and strategic goals are incorporated into development work.
  • Collaborate with other engineering teams to identify and anticipate changing requirements and opportunities to improve the development environment.
  • Monitoring at scale with VictoriaMetrics and the likes
  • Orchestrating and managing with K8S and the likes
  • Implementing best practices, challenging the status quo, and tab on industry and technical trends, changes, and developments to ensure the team is always striving for best-in-class work.
  • Manage capacity, build security into every layer, and reduce cost
  • Implement secure networking, key management, user management, access management, process management, and image management.
  • Effectively lead and manage team deliverable (short/long term) project planning and coaching, quarterly reviews, participation in the selection process for new hires, and technical and non-technical guidance to the team.

Skill Requirements:

  • Proven experience in handling large infrastructure and distributed systems like Yarn, Kubernetes, Elasticsearch, etc..
  • Tech Stack - Python, AWS, Azure, Linux.
  • Familiarity with Python-related technologies and frameworks like Falcon, Django, or Pyramid.
  • Experience with Unix/Linux operating systems internals and administration (e.g. filesystems, inodes, system calls, etc.) or networking (e.g. TCP/IP, routing, network topologies, and hardware, SDN, etc.)
  • Familiarity with the  cloud computing infrastructure, preferably Azure
  • Familiarity with task queue frameworks like Celery or Pika is a plus.
  • Source code management and Implementation of security best practices.
  • Familiarity with any one container orchestration tools build, artifact, packaging, service discovery management tools.
  • Know-how of gathering metrics across distributed systems (instances/container) & generating automated notifications, and reports.
  • Prowess in analyzing App bottlenecks, and performance degradation, and implementing automated processes/tools to detect such anomalies.
  • Good understanding & implementation experience using 12-factor App principles.

Mandatory Skills:

  • 3+ years of Experience on the AWS/Azure platform.
  • Proficiency in Python or shell scripting languages.
  • Hands-on experience in container technologies (K8s, ArgoCd, Helm/Kustomize)
  • Having a mindset as Automate anything.
  • Experience with AWS/Azure cost explorer, billing analysis, and various cost optimization techniques.
  • Awareness of Cloud Security concepts
  • Awareness of Information Security concepts and Best Practices

Good to have:

  • AWS/Azure cloud certification preferred
  • Certification in Kubernetes Administrator (CKA).
  • Certification in Kubernetes Application Developer (CKAD)
  • Experience with configuration management tools and strong code analysis skills in Python
  • Experience in working with APM-based tools like New Relic


At MoEngage, we are passionate about our team and technology. We handle more than a billion messages every day. Rest assured, you will be surrounded by really smart and passionate people as we scale much more to build a world-class technology team.