Job Information
Procter & Gamble Intelligent Operations SRE (Site Reliability Engineer) in Warsaw, Poland
Job Location
Warsaw
Job Description
As the SRE you will be responsible for managing and optimising internally developed P&G wide Observability and Event Management platforms.
You will share your time between operational work (50%) and engineering work (50%) to provide system reliability, optimize cost, meet compliance and security requirement and deliver new functionalities for the business. Detailed responsibilities are as follows:
Operational support for Europe region:
Hands on work on incident management, client onboarding, and handling customer requests, ensuring timely resolution and adherence to SLIs, SLOs, and SLAs .
Continuous improvements for SLO and SLI.
Reduce Mean Time to Restore Service and Mean Time to Resolve Requests.
Global Platform reliability and resilience
Drive the implementation of self-observability and self-healing capabilities, leveraging industry-leading tools and technologies.
Design, maintain and regularly test business continuity processes
Design, maintain and regularly test disaster recovery mechanism
Perform changes, upgrades, and regular maintenance tasks for existing solutions, ensuring system stability and optimized performance.
Global Platform cost:
Drive visibility for cost components and their association to pricing model
Run ongoing FinOps processes, identify and execute cost savings initiatives.
Self-service of platform capabilities:
Translate insights gained customer requests and onboardings into actionable proposals for automations, new capabilities, and process changes to increase self-service among users.
Increase scope of self-service capabilities and drive their adoption among user community.
Automate onboarding of new projects and plants into platform.
Platform enhancements:
- Collaborate closely with product and engineering teams to influence the product roadmap, provide valuable input for product increments, and align sprints with operational requirements.
Design and development of selected user stories from product backlog
Job Qualifications
Technical Expertise and Experience:
Extensive knowledge and experience in IT technologies, spanning from operating systems to network infrastructure and cloud platforms. The following is a list of technologies that are required for the role (note: we are flexible to accept candidates who have a solid foundation or incomplete mix but are determined to learn on the job):
Proficiency in Kubernetes, with hands-on experience in running and investigating workloads in a Kubernetes environment.
Strong scripting and automation skills.
Familiarity with GitHub.
Hands-on experience with Cloud platforms (Azure preferred) and infrastructure provisioning.
Familiarity with observability ecosystem tools such as Prometheus, Thanos, Grafana, etc. would be advantageous but not mandatory.
Bachelor's degree in Computer Science, Information Systems, or a related field, a Master’s degree is a plus
Soft skills:
Strong planning and organizational skills, enabling effective work and task management for oneself.
Strong problem-solving and troubleshooting skills, with the ability to analyse complex issues and devise effective solutions.
Effective communication to convey real time information during incidents and being able to translate technical issues into clear communication targeted at non-technical stakeholders.
Proactive and self-motivated, with a continuous learning mindset and a drive for staying updated with industry trends and technologies.
Ability to thrive under pressure and effectively manage incidents, ensuring timely resolutions and minimizing downtime.
Job Schedule
Full time
Job Number
R000119184
Job Segmentation
Recent Grads/Entry Level (Job Segmentation)
Procter & Gamble
- Procter & Gamble Jobs