Current Events > The weird DHCP issue at work is FIXED! Finally...

Topic List
Page List: 1
CableZL
08/31/23 12:37:43 AM
#1:


About a month ago, we upgraded the software on our Fortinet firewalls. After that, DHCP traffic wasn't going through properly at one of our data centers. I thought it was 100% a Fortinet issue since that's the only change that was made, and it doesn't make sense for only DHCP traffic to be dropped by any networking device.

Turns out the firewalls were still sending the traffic to the downstream core Cisco Nexus switches, but one of those Cisco Nexus switches was just dropping all real DHCP traffic that passed through it. If we generated a fake DHCP traffic with iperf, it worked. Real DHCP traffic was getting dropped. Cisco couldn't provide a solution through troubleshooting, so they recommended that we reboot the switch. We rebooted that core switch tonight and boom, the problem isn't happening any more.

This will be one that all of our network engineers will have to keep in mind because we periodically fail back and forth between our two data centeres.

---
https://i.imgtc.com/d9Fc4Qq.gif https://i.imgtc.com/BKHTxYq.gif
https://i.imgtc.com/vYYIuDx.jpg
... Copied to Clipboard!
Tyranthraxus
08/31/23 12:42:05 AM
#2:


CableZL posted...
About a month ago, we upgraded the software on our Fortinet firewalls. After that, DHCP traffic wasn't going through properly at one of our data centers. I thought it was 100% a Fortinet issue since that's the only change that was made, and it doesn't make sense for only DHCP traffic to be dropped by any networking device.

Turns out the firewalls were still sending the traffic to the downstream core Cisco Nexus switches, but one of those Cisco Nexus switches was just dropping all real DHCP traffic that passed through it. If we generated a fake DHCP traffic with iperf, it worked. Real DHCP traffic was getting dropped. Cisco couldn't provide a solution through troubleshooting, so they recommended that we reboot the switch. We rebooted that core switch tonight and boom, the problem isn't happening any more.

This will be one that all of our network engineers will have to keep in mind because we periodically fail back and forth between our two data centeres.

Wait.

So it took how many engineers and combined Cisco certifications to try rebooting the switch?

---
It says right here in Matthew 16:4 "Jesus doth not need a giant Mecha."
https://i.imgur.com/dQgC4kv.jpg
... Copied to Clipboard!
Kloe_Rinz
08/31/23 12:44:25 AM
#3:


Tyranthraxus posted...
Wait.

So it took how many engineers and combined Cisco certifications to try rebooting the switch?
you generally dont reboot network gear as it will cause an outage. (Depending on what redundant gear is set up)
... Copied to Clipboard!
CableZL
08/31/23 12:45:22 AM
#4:


Tyranthraxus posted...
Wait.

So it took how many engineers and combined Cisco certifications to try rebooting the switch?
In an enterprise scenario, rebooting is the last step you take because it takes down a lot of other services.

---
https://i.imgtc.com/d9Fc4Qq.gif https://i.imgtc.com/BKHTxYq.gif
https://i.imgtc.com/vYYIuDx.jpg
... Copied to Clipboard!
Tyranthraxus
08/31/23 12:49:22 AM
#5:


CableZL posted...
In an enterprise scenario, rebooting is the last step you take because it takes down a lot of other services.
My job has HA everything. We can basically reboot anything even s database server without an outage. It's awesome. It wasn't like that when I got hired but the CTO made it a priority and we finally got there a few years ago.

We can even reboot host machines without downtime just need a few minutes to evacuate them first.

---
It says right here in Matthew 16:4 "Jesus doth not need a giant Mecha."
https://i.imgur.com/dQgC4kv.jpg
... Copied to Clipboard!
CableZL
08/31/23 12:52:52 AM
#6:


Tyranthraxus posted...
My job has HA everything. We can basically reboot anything even s database server without an outage. It's awesome. It wasn't like that when I got hired but the CTO made it a priority and we finally got there a few years ago.

We can even reboot host machines without downtime just need a few minutes to evacuate them first.

That's cool. We have most things on our network in HA, but the other problem we have is that we have a lot of sensitive applications that could be flowing through things on our network. Rebooting the switch during the day likely would have been fine, but there's no way we would have had approval to reboot it during the day because of the chance of interrupting an application during business hours.

I don't think any of us expected anything to actually just be straight up dropping DHCP traffic like this, either.

---
https://i.imgtc.com/d9Fc4Qq.gif https://i.imgtc.com/BKHTxYq.gif
https://i.imgtc.com/vYYIuDx.jpg
... Copied to Clipboard!
Tyranthraxus
08/31/23 12:58:29 AM
#7:


CableZL posted...
That's cool. We have most things on our network in HA, but the other problem we have is that we have a lot of sensitive applications that could be flowing through things on our network. Rebooting the switch during the day likely would have been fine, but there's no way we would have had approval to reboot it during the day because of the chance of interrupting an application during business hours.

I don't think any of us expected anything to actually just be straight up dropping DHCP traffic like this, either.

It's definitely a weird problem. We don't have the same network needs as you though so everything is static & manually routed by an external and internal load balancer.

I don't know much about how that shit works though. Load balancer is basically fucking magic as far as I'm concerned.

---
It says right here in Matthew 16:4 "Jesus doth not need a giant Mecha."
https://i.imgur.com/dQgC4kv.jpg
... Copied to Clipboard!
#8
Post #8 was unavailable or deleted.
CableZL
08/31/23 8:52:13 AM
#9:


Tyranthraxus posted...
It's definitely a weird problem. We don't have the same network needs as you though so everything is static & manually routed by an external and internal load balancer.

I don't know much about how that shit works though. Load balancer is basically fucking magic as far as I'm concerned.
Yeah, when I started, everybody on the network team was responsible for the entire network architecture, so I started to learn a lot about F5 load balancers. They've since split the team up into different groups and I'm on the team that handles a certain type of branch location. It's rare that I ever touch data center equipment other than stuff at the edge these days.

---
https://i.imgtc.com/d9Fc4Qq.gif https://i.imgtc.com/BKHTxYq.gif
https://i.imgtc.com/vYYIuDx.jpg
... Copied to Clipboard!
#10
Post #10 was unavailable or deleted.
Questionmarktarius
08/31/23 12:03:34 PM
#11:


There's a reason "have you tried turning off and on on again?" is sound IT support.
... Copied to Clipboard!
#12
Post #12 was unavailable or deleted.
CableZL
08/31/23 12:10:19 PM
#13:


Yeah, the vast majority of the time, enterprise network problems are fixed by running commands and not rebooting the device. Rebooting a device is an enterprise environment is often like using a bomb to kill an ant. Rebooting is also rarely the actual solution to enterprise network problems. That's why Cisco TAC was troubleshooting for over 8 hours on the phone before they recommended a reboot.

---
https://i.imgtc.com/d9Fc4Qq.gif https://i.imgtc.com/BKHTxYq.gif
https://i.imgtc.com/vYYIuDx.jpg
... Copied to Clipboard!
#14
Post #14 was unavailable or deleted.
CableZL
08/31/23 7:59:14 PM
#15:


Yeah, Cisco TAC is the best support team I've worked with. I wish their products weren't getting priced out of the market.

---
https://i.imgtc.com/d9Fc4Qq.gif https://i.imgtc.com/BKHTxYq.gif
https://i.imgtc.com/vYYIuDx.jpg
... Copied to Clipboard!
DrizztLink
08/31/23 8:00:08 PM
#16:


It doesn't seem like it'd be that hard to fix the Dead Hot Chili Peppers.

---
He/Him http://guidesmedia.ign.com/guides/9846/images/slowpoke.gif https://i.imgur.com/M8h2ATe.png
https://i.imgur.com/6ezFwG1.png
... Copied to Clipboard!
flussence
08/31/23 8:06:40 PM
#17:


i love completely inexplicable network problems

Haven't been able to get the erlang.org docs to load for months now. Every other site, fine. The *redirect* to www, that works fine. I've spent an hour looking at wireshark and nothing. Nobody else has any idea or the same problem

---
Editor's note: The sound of children screaming has been removed.
... Copied to Clipboard!
punkfanalways
08/31/23 8:07:39 PM
#18:


CableZL posted...
In an enterprise scenario, rebooting is the last step you take because it takes down a lot of other services.

Plus always the risk it doesnt come back up. More so if youre remote. I generally like fortinet and have NSE4 but sometimes its difficult to convince clients to part with their cash over a draytek or more entry line unit.
... Copied to Clipboard!
punkfanalways
08/31/23 8:11:09 PM
#19:


flussence posted...
i love completely inexplicable network problems

Haven't been able to get the erlang.org docs to load for months now. Every other site, fine. The *redirect* to www, that works fine. I've spent an hour looking at wireshark and nothing. Nobody else has any idea or the same problem

Very niche but we had a similar issue. Turned out an upstream router had a typo in a static route and all traffic for a couple of sites dropped / timed out. Randomly caused an issue about three years after the change was made so that was difficult to diagnose.

Doubt that applies to your situation but thought Id mention it.
... Copied to Clipboard!
Topic List
Page List: 1