Descrizione Lavoro
Agile Lab is a company founded in 2014 with the mission to create value for its customers in data‑intensive environments through customisable solutions that establish performance‑driven processes, sustainable architectures and automated platforms based on data governance best practices. Having delivered over 100 successful Elite Data Engineering initiatives, we have used this experience to create Witboost: a modular, technology‑agnostic platform that enables modern organisations to discover, value and produce their data in both traditional environments and fully compliant Data Mesh architectures. With a highly skilled team of over 260 data engineers based in Europe, Agile Lab helps organisations with their data‑driven transformation. Take a look at our handbook to discover our core values and processes.
Opportunity
We are looking for a Site Reliability Engineer II (SRE II) to join our growing team. You will play a key role in maintaining the reliability, observability, and operational efficiency of enterprise‑level distributed systems. In this role, you’ll coordinate a small technical team (3–4 people) in managing microservices in complex production environments. You will be involved in monitoring, incident management, release coordination, and performance tuning, with a strong focus on OpenShift platforms. You’ll also work closely with multiple cross‑functional teams to ensure high availability and performance of our cloud‑native services. This role includes on‑call availability.
Salary: 38.5K-48.5K
Responsibilities
Ensure high reliability of microservices running in OpenShift environments
Lead and coordinate a technical team of 3–4 engineers for operational excellence
Manage incident resolution and ticketing workflows via ServiceNow
Collaborate with development teams to drive performance optimization and tuning
Design, configure and maintain monitoring dashboards (Grafana, Prometheus, etc.)
Coordinate with Service Control Room to maintain effective alerting and response
Oversee release processes of new features, hotfixes, and updates in production
Requirements
Degree in Computer Engineering, Computer Science, or a related field
Proven experience in Application Maintenance Services (AMS): minimum 2 years
In‑depth knowledge of OpenShift and microservices in cloud‑native environments
Ability to technically and operationally lead a team of 3–4 people
Experience in release management, monitoring, and incident resolution
Excellent communication and cross‑functional coordination skills
Strong initiative, operational autonomy, and results‑oriented mindset
Fluency in Italian (mandatory requirement)
Monitoring & Observability: Grafana, Prometheus, Kibana, Jaeger, Datadog, OpenTelemetry
Cloud/DevOps: OpenShift, GitLab, Jenkins
Data & Messaging: Kafka, MongoDB, Ignite
Ticketing & ITSM: ServiceNow
Benefits
Full Remote or hybrid working in our offices: Milan, Turin, Padua, Bologna, Catania and Rende
Real work life balance
Training monthly budget (time and money)
Support of a buddy in the first week of work
Benefits and corporate welfare programs: company prizes and welcome pack with all the equipment you need to work
Agile Nomads Experience: opportunity to work for 2 weeks abroad
Referral bonus, if you bring people as talented as you
The opportunity to attend one conference per year
A company rated 4.8 out of 5 for employee satisfaction on Glassdoor and certified as a Great Place to Work
Inclusive environment where you can be who you really are
Stimulating environment oriented to growth, both professional and personal
How we work
We don't like hierarchies: we work as a team
We don't like bureaucracies, we prefer sense of responsibility
We like data, certainly, so anything that is measurable
We want to make a positive change in our industry
Empathy, humility, collaboration, and willingness to challenge ourselves are the basis of our work
Please note
Only candidates based in European time zones (CEST or similar) will be considered for this position.
#J-18808-Ljbffr