Job Description

Opportunity
We are looking for a Site Reliability Engineer to manage availability, latency, performance, monitoring, business continuity, and scaling problems to help us continue to build and deliver our managed accounts technology. You would be an integral part in SmartX’s operations team by ensuring that the overall system health is monitored and issues are remediated before it impacts our clients. 

Position Requirements

  • Demonstrable experience in monitoring and managing cloud based applications and infrastructure
  • Demonstrable experience in code deployment solutioning
  • Strong knowledge of monitoring tools and solutions 
  • Experience in writing clear, concise and comprehensive post mortem reports
  • Experience with AWS and Google Cloud
  • Solid knowledge of SQL and scripting
  • Experience with performance, scaling, and security is a huge plus

Responsibilities

  • Design, write, and maintain software to improve the availability, scalability, latency, and efficiency of SmartX’s services, incorporating third-­party open-source tools when available
  • Design and implement the tools and processes used for deployment and change management
  • Plan and execute configuration management
  • Own, maintain, and continuously improve all systems provided as a service, such as monitoring and datastores
  • Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks
  • Automate resource provisioning and allocation process
  • Run software performance analysis and system tuning
  • Plan and execute disaster recovery drills
     
More Details
Employment Type: Full Time
Location: West Palm Beach , Florida , United States
Experience Required: Mid-Senior Level
Date Published: 05 Nov 2019
Share Job Opening