Automation without validation: Risky operation

If you run a large, complex network, you have either already heavily invested in automating key management tasks or are about to. Network automation is a great way to reduce human errors and accomplish those tasks with consistency and speed.

Picture of Frustration To err is human; to really foul things up requires a computer. — BILL VAUGHAN

But network automation is not without risks. One risk is bugs in automation logic itself, which occur because handling the diversity of network vendors and devices effectively is hard. Another risk is humans providing incorrect inputs to automation. One senior network engineer recounted to us an incident that drives this point home. His team had an automated data center network expansion. A script automatically populated most of the configuration for new devices, but it needed humans to fill in details such as the AS number. Inevitably, one of the many times that the script was used to provision a new device, the engineer fat-fingered the AS number. That disrupted many key services for an hour.

While errors in non-automated environments may only impact individual routers, errors in automated environments can bring down the entire network in one fell swoop. In a way, automation trades off the risk of frequent, low-impact errors with infrequent, high-impact ones.

Further, automation is rarely perfectly versatile. Even with good automation, operators need to manually make changes that are not supported by the automation framework. A common pattern, for instance, is to generate new configurations via automation but make subsequent changes manually. Unfortunately, even a single manual change can destroy the original correctness guarantees.

Enter network validation, a technology to guarantee that the configurations generated by automation or humans are correct. “Correctness” may be specified in terms of best practices, compliance requirements, or intended data flow behaviors (e.g., all external traffic should traverse a firewall). It may also be defined based on the expected impact of planned changes, e.g., no users should lose connectivity or no services should be unavailable after the change.

Good validation technologies will ensure that the correctness guarantees hold for all possible packet flows and device/link failures and will account for differing device behaviors even in complex, multi-vendor networks. No human can accomplish this via manual review of configuration or testing in a lab environment.

Typically, network validation is invoked during the change review and testing phase and before configuration changes are pushed to the network. Unlike lab tests and human reviews, this validation is full-scale and fully automatic. Network validation closes a key gap in the CI/CD (continuous integration/continuous deployment) pipeline of configuration deployment and provides a robust line of defense against configuration errors.

If you are serious about network automation, you should get serious about network validation as well. It will safeguard your network against human and automation errors and provide peace of mind that you need as you evolve your network faster than ever.

Getting started is easy! Try the open source Batfish tool.