Senior Lead Architect/Senior Manager - CDN/Network Automation
Technical lead and manager responsible for Architecting, Designing and allocating work for a team of 3-6 responsible for Site Reliability, Configuration Management, Deployment, and Automated Failure Recovery for Lumen CDN managing 20K servers, VM's and containerized work flows. CDN produced $100M+ in yearly recurring revenue while delivering greater than 100 Tbps.
CMAPI - Architected and Designed an authenticated RESTful API (FastAPI) to aggregate and cache CM data from 10+ systems. Currently used by monitoring, config management and repair automation as source of truth for network disposition. This system is constantly evolving as we add new data sources and have new requests for custom aggregations for both internal and external uses. It is the foundation and source of record for all projects listed below.
Trial ROD - Implemented a GitOps model using Rest API's through GitLab CI/CD to identify clusters/servers for upgrade and apply new trial configurations through CMAPI interfaces and utilize Release On Demand to deploy new applications/configurations to the network (bare metal and VMS). The resulting project gave ownership of candidate releases to the engineers responsible for validating performance and expedited candidate deployments from monthly to 1-5 per week.
Release On Demand - Architected and Implemented staged deployment for 20K operational CDN servers based on the disposition of the network. RoD removed the onus for generating a upgrade schedule from operations, while taking in to account traffic levels, metros, server health and network blackout dates. It increased release cadence for primary distribution network from 1-2 GA releases per year, to our targeted goal of once per month.
Section IO Compute Engagement - Operationalized a PoC for a partnership with SectionIO to repurpose CDN hardware that was no longer viable for edge delivery due to network constraints and utilize it for a multi-tenant customer facing compute product. Utilized Canonical MaaS to build and configure machines to hand over to SectionIO for proprietary application install and configuration. We were able to reallocate 800 out-of-date and impaired servers at no cost other than space, power and networking.
Salt Versioned Formulas - Extended the Salt concept of environments to include supporting multiple versioned CM code based on "Release States" and server role by rebuilding them on the fly prior to running a job. One control file defines which role and release state get which versions of code, allowing us to trial new formulas on a targeted network segment while knowing we would not affect other segments. Presented our results at 2019 Salt Conf.
RA/CM Replacement - Documented existing Architecture and Defined/Implemented next-gen architecture for migration from a wholly home-grown network automation system to Hybrid Salt solution. Incorporating Salt allowed us to more consistently and quickly incorporate changes to our growing network including but not limited to new OS's, containers and virtualization.
CI/CD - Incorporated and mandated the concept of CI/CD within the network automation code base. Provided templates and frameworks for consistent automated testing utilizing VM's and containers with the pytest and Test-Kitchen frameworks. Each new GitLab project is created with a set of configurations for virtual env's (Poetry, RVM), testing virtualization, linting, and deployment models. The developer just needs to add test either from scratch utilizing shared custom pytest libraries with common fixtures. This initiative has minimized the need for regression testing because tests grow naturally as bugs are found fixed and validated.
VM/Container Integration - Architected, Designed, and Implemented frameworks for deploying and configuring VM's (VMware) and Containers (Docker/Kubernetes) in production environment. Containers allow us to abstract away the OS layer and maintain some of our legacy applications while still addressing security concerns. Virtualization has reduced the time to grow and shrink our network from weeks to purchase and allocate hardware to minutes to define our new server in inventory and use automation to stand up and configure.
Edge Portal - Implemented container based deployment of a secure publicly available interface utilizing Traefik with TLS certificates, custom Golang/Node.js applications, KrakenD, Postgres DB's, and ElaticSearch/Kibana monitoring using FileBeat, Metricbeat.