Currently I am looking for a Senior Systems Engineer (SRE) to join the EMS team and code, manage, leverage different tools and maximize their value: examine all the existing setup of monitoring, collaborate with the stakeholders and build automated solutions to help detect, log and resolve events that can potentially cause service disruptions.
- Analyze and transform operational and/or functional needs of the organization into monitoring solutions, while remaining compliant with the standard IT policies and procedures
- Manage the life cycle (onboarding, maintenance, migration and retirement) of all monitoring tools and platforms under ownership. Administer and provide software support for monitoring tools, and perform the necessary customization and implementations with any of the tool suites.
- Build a catalog with detailed descriptions of system monitoring parameters and integrate them to optimize the overall effectiveness and efficiencies.
- Day to day administration of the Monitoring platform, with focus on improvements that will help reduce alert volumes without compromising system stability and availability.
- Maintain and support infrastructure Monitoring environment to ensure the highest availability while reducing the impact of incidents.
- Conduct in depth evaluations of monitoring / alert data to assist with the diagnosis of various infrastructure and application problems.
- Test, recommend and implement new monitoring technologies. Retire the underused and outdated monitoring technologies with higher costs and / or diminishing returns.
- • 7+ years of experience, 3 of which would be spent on infrastructure monitoring and site reliability
• Experience installing, configuring and maintaining monitoring software such as IPCenter (or equivalent), Dynatrace, AppDynamics, Splunk, SCOM, VMWare VRops, AWS CloudWatch, Nagios or Azure Monitoring etc
• Solid working knowledge of both Windows and Linux Operating Systems, file and directory structures, commands, command-line interfaces and utilities
• Proficiency in scripting languages: Python, PERL, Shellor VBS, working with APIs, SQL and WebServices
• Knowledge of IT Best Practices as they relate to the following areas: IT Infrastructure Monitoring, Data Networks, IT Security, Virtualization, Web Servers, Cloud and Storage technologies
- Opportunity to understand enterprise level infrastructure monitoring in a big company. Monitor each technology. Bring previous experience with respect to enterprise monitoring and coding experience and change the way things are set up at this point and how we operate.
- Flexible working hours as long as the job is done.