Please do not apply for this role if you are not physically located in Americas (UTC-8 to UTC-4 / PST to EST).
While this is a remote position, we can not consider candidates that are not based in these regions. You can find a detailed explanation in our Recruitment FAQs.
At Hotjar, we’re creating Product Experience Insights software for digital product teams. We help show how users behave and what they feel strongly about, so product teams can deliver real value, fast. As part of our Engineering team, you'll work on the exciting challenges that come with large-scale web traffic and analytics.
We work in an agile and highly collaborative environment, 100% remotely, and challenge the norms of traditional business leadership. Our focus is on true transparency and respect.
We're looking for an enthusiastic Engineering Lead who is passionate about reliability at scale - someone who loves being part of a team, whilst enjoying the autonomous nature of working remotely.
Experience in programming is required for this role (we use Python a lot), solid work experience with AWS and Kubernetes is required, and lots of ideas about monitoring and SLOs are beneficial. Experience working in a SaaS/Product company and a background in running reliable operations is also key.
About the job:
- Reporting into our SRE Team Lead, be part of an agile team setting up and maintaining the infrastructure that powers our applications and services on AWS.
- Design, deploy and maintain tools and services to support a robust infrastructure.
- Collaborate and mentor engineers in product teams to help them bring up new microservices.
- Ensure all necessary monitoring, alerting and backup solutions are in place. Using SLOs to guide prioritization putting reliability front and center.
- Dive into large codebases, not being afraid of programming more than a few lines of bash.
- Spend a small amount of your time dealing with incidents and internal change requests. This is not a service-desk or incident-only position, the vast majority of your time will be spent creating and optimizing our tools and infrastructure, not firefighting.