Notes on Network Automation !!!
A simple misconfiguration can took down hundreds of devices, and it changed how we think about CLI, automation, and scaling networks.
This week I spent some time going through network automation basics. Not deeply, just trying to understand what problem we are actually solving. I was not focusing on tools yet. More on the “why” behind all of this.
I would like to start with one of the scenerio where a retail network with around 500 stores had a full outage during what was supposed to be a routine change.
Everything was managed through CLI. During a security update, an ACL was pushed manually across devices. The configuration itself was not complex, but the order was wrong. A deny all rule was placed before the permit rules.
That was enough to stop traffic everywhere.
At first, it looks like a simple mistake. Someone configured it wrong. But after thinking about it for a while, it did not feel like just a human error. It felt more like a system problem.
Because one small mistake should not be able to take down an entire network like that.
That only happens when the same action is repeated manually across a large number of devices without any real validation in between. The mistake itself was small, but the way it was applied made it big
Where CLI Starts Feeling Uncomfortable
Most networks still run on CLI. Login via SSH , run commands, check output , move to enxt devcie .That is how things are done
But when the number of devices increases, something starts to feel off. Not immediately, but gradually. You spend more time repeating steps than actually thinking about what you are doing. It becomes mechanical.
And the more repetitive it gets, the more it depends on you not making a mistake.
That is the part that does not scale.
It is not about effort. It is about consistency. Doing the same thing correctly, every single time, across hundreds of devices. That is harder than it sounds.
First Thought Was, Just Automate It with Python
Naturally, the next thought was scripting.
Connect to devices, send commands, capture output. It feels like a direct upgrade from CLI. Same workflow, just automated.
This is probably why most people start here. It feels familiar and you can get something working quickly.
But after looking into it a bit more, there is something slightly uncomfortable about it.
You are still dealing with text.
The commands are text. The output is text. Everything depends on how that text looks.
And that is not stable.
A small change in output format can break everything. The information is still there, just displayed differently. That is enough for parsing logic to fail.
I had not really thought about this before. From a human perspective, the output still makes sense. From a script’s perspective, it can completely break.
So now instead of manually checking output, you are writing logic to interpret it. Which adds another layer that can go wrong.
It works, but it does not feel reliable when you think about scaling it.
APIs Felt Cleaner, But Took Time to Click
Then I looked at NETCONF and RESTCONF.
At first, it was not immediately clear why this was better. It just looked like another way of doing the same thing.
But after spending some time with it, the difference started to make sense.
Instead of sending commands, you are sending structured data. Instead of parsing output, you are receiving structured responses.
JSON or XML.
This changes things in a better way.
You are no longer guessing where the data is. You are accessing it directly.
Error handling also becomes clearer. You are not searching for keywords in output, you are reading defined responses.
It feels more predictable.
Still, not everything is available via API. So CLI does not disappear. It just becomes one option among others.
Ansible and Terraform Were Not Obvious at First
This part honestly took me some time. Because it is not just a tool difference. It is a thinking difference.
Earlier the approach was always step by step. Run this command. Then this. Then check. Now it becomes something else. You define what should exist.
For example, instead of writing steps to create a VLAN, you just say VLAN 10 should be there.
The system figures out whether it needs to create it or not.
This felt strange initially. I kept thinking in steps, so this did not click immediately.
But after a while, it started to make sense.
You are not managing execution anymore. You are defining intent.
There is also this concept of idempotency, which sounds complex but is actually simple.
Running the same thing again should not change anything unnecessarily.
That turns out to be very important when you are dealing with repeated operations.
Terraform goes a bit further.
It keeps track of what already exists and compares it with what you want. Then it calculates what needs to change.
That makes things more predictable. You can see changes before applying them.
It feels closer to managing a system rather than running commands on devices.
Controllers Felt Powerful, But Also Slightly Distant
Then I looked at controller-based platforms.This is where you stop thinking about devices directly.
You define something like creating a network across multiple locations, and the system handles the rest.
This looks powerful, especially at scale.But it also feels a bit distant from what is actually happening underneath.
You are relying on the platform to do the right thing.
And while that works for standard use cases, I am not sure how it behaves in edge scenarios.
There is also dependency on the vendor, which changes how flexible your setup is.
So it feels useful, but not something that replaces everything else.
What I Thought Initially vs What I Think Now
Initially, I was trying to figure out which tool is best. That question does not really make sense anymore.
Each approach solves a different part of the problem.
→ CLI is simple but fragile at scale
→ Python scripting gives control but depends on parsing
→ APIs make things structured
→ Ansible and Terraform improve consistency
→ Controllers simplify large environments
It is not about picking one.
It is about understanding where each one fits.
What Actually Stayed With Me
Automation, at least from what I understand now, is not about doing things faster.
It is about making sure the same thing happens correctly every time.
Especially when the system grows.
What I Am Doing Next
Right now I am not jumping into complex automation setups.
Still trying to get basics right. Python network fundamentals
simple scripts
Trying to understand things properly before layering more complexity.
Closing Thought
I am still figuring things out. But one thing feels clear.
Manual processes do not fail immediately. They fail when they scale.


