Skip navigation

Category Archives: Networking

Ok, so here’s the deal: you want to monitor all your network infrastructure with Nagios (or something else, but these instructions are geared towards Nagios specifically), but you’re new to the program and so you’re starting with the little things first, like collecting system up-time on your switches.  The problem is that you’re just seeing service timeouts where you would like to see that up-time.  Fear not!  I have done the leg-work (or click-work) and present you with the answer to your problem in two parts.

The Switch

The first part is to get your switch set up to use SNMP version 3 (much more secure than 1 or 2c).  Log in to your switch and run the following

conf t
snmpv3 enable

At this point, you’ll have to set up an initial user. Don’t sweat it–you’ll be deleting this user in a minute, so you can just enter junk information here.

snmpv3 only
snmpv3 user "username" auth sha "password" priv des "password"
snmpv3 group operatorauth user "username" sec-model ver3
no snmpv3 user initial
wr mem

Now SNMP is enabled on your switch, you can move over to the Nagios end of things.

Nagios

Note: I’m using NConf to configure Nagios, and these instructions assume that you’re doing the same.  Wherever your snmp_check command is configured, set it up to execute with the following parameters:

snmp_check!-P 3 -a sha -U [username as set above] -A [password as set above] -L authNoPriv -o 1.3.6.1.2.1.1.3.0

Now just reload the Nagios service, and the next time a service check is run for up-time, you should get a nice number telling you how many days the switch in question has been up.

Giving credit where it’s due, the SNMP configuration information came from here (where you can also find information on how to configure Cisco switches).

When I took on the role of Systems and Network Administrator at my work, we had been using a Linux-based software firewall as the backbone of our network.  In fact, up until about seven weeks ago, we’d been running the same hardware box for around seven years.

Then it crashed.

Luckily, the crash was caused by a problem with the motherboard, and we had a recently-decomissioned server running on the same hardware that we could just swap in without too many problems (though I did spend most of a weekend at work getting everything back up and running).  After we got the firewall back up and running, though, more and more problems started coming out of the woodwork.  Our RADIUS-based MAC authentication was spotty sometimes, and whole classes were unable to access the network.  It was clear that something had to change, but until we could isolate the problems, we couldn’t even start.

Consultants were consulted, outside eyes looked over our infrastructure, and there was an “aha!” moment.  The Linux firewall, the heart of everything, had become insufficient.  Every year, we’ve added more devices, and with the pilot of a one-to-one iPad program in our middle school, we had hit the breaking point.  If our firewall had only been doing firewalling and routing functions, we might have been able to go on for another year, but with iptables, RADIUS, squid caching, routing, and DHCP all running on the same box, with pretty much all of our traffic making several trips through the one internal interface, the system bus, and asking for CPU clock cycles on our firewall, there was no way that we could sustain the model indefinitely.

So what did we do?  We made a major overhaul of our core infrastructure, moving different services to different hardware.  You can (for a pretty penny) get switches that do both layer-2 switching and layer-3 routing at line speed.  We had a firewall appliance that had never been fully deployed before precisely because it takes a lot of work to break out all the services we had running on our firewall and keep everything running smoothly without the end-user noticing a change.  Of course, with such a big change on an inherited network, there are things that didn’t get caught right away, but that always happens.  After some late nights, our network has smoothed out to the point that I’m not just putting out fires constantly.

But where does this leave the Linux firewall?  While I have a working, if somewhat limited knowledge of Cisco switching, wireless, and internet-telephony solutions, their security appliances and layer-3 switches are mostly foreign to me.  I won’t claim that I’m an expert with iptables, but I knew my way around the command well enough to maintain things.  But the question is larger than this one case.

For small and even medium businesses, a Linux firewall is probably still the best, most economical choice if you have a serious network, as long as you have or are willing to gain the appropriate Linux wizardry, with the caveat that that box should only be doing firewalling and routing.  If you have other services that you need to run on your network, put them somewhere else, especially if you’re running something like RADIUS, where timely response packets are required for authentication.  However, if you’re supporting many hundreds of devices across multiple VLANs and expect to expand even further, a hardware-based solution will be a better investment in the long run, even if it’s a greater initial expense.

Tower defense games: everyone’s played them in some form, they’re pretty ubiquitous in this touchscreen world.  At lunch today, I hit upon a new twist that I don’t think has been tackled before: the WiFi needs of users at a large organization (in this case, a school).

Now imagine that they want to kill you because they can’t get on Facebook to post another twelve duckface selfies.

The idea is that you, the player, are the sysadmin on a school campus.  You have a budget for network infrastructure, and you need to keep the WiFi running and able to meet the demands of the users as they bring in more and more devices.  You can get status reports from various parts of your user base much like you could get reports from your cabinet in Sim City 2000, and your life bar would tick down and more and more users stream into the always-open door of your office.  Your campus would start small but eventually expand as you got more users (students).  The construction of buildings would, of course, affect the signal of your access points, and you would also have to make sure that you had enough switchports and network drops available to connect both APs and wired users.

The best part?  It teaches while you learn!  Balancing user needs against a budget and battling all the issues that can come from a mix of WiFi infrastructure and user devices and needs is a useful skill.  Will you force an SSID to go entirely to 5GHz at the risk of crippling a cart full of ancient laptops that only have a 2.4GHz antenna?  What do you do when you start piloting a one-to-one program?

So my call to you, fair readers, is to help me make this game a reality.  I’m not a game designer, but I am a sysadmin, so I am familiar with a lot of scenarios, and, with my training as a writer and my general nature, I have ideas and opinions, which I would be happy to force upon share with you if you decide to jump on board.

I also have a GitHub repo for the project.  I hope that you can help me make this project happen.

Yesterday, I got a first-hand demonstration of how a simple, well-meaning act of tidying up can have far-reaching consequences for a network.

Our campus uses Cisco IP phones both for regular communication and for emergency paging.  As such, every classroom is equipped with an IP phone, and each of these phones is equipped with a switch port, so that rooms with only one active network drop may still have a computer (or more often a networked printer) wired in.  If you work in such an environment, I hope that this short tale will serve as a cautionary tale about what happens when you don’t clean up.

I was working at my desk yesterday afternoon, already having more than enough to do, since the start of school is only a few days away, and everybody wants a piece of me all at once.  While reading through some log files, a bit of motion at the bottom of my vision caught my attention: the screen on my phone had gone from its normal display to a screen that just said “Registering” at the bottom left with a little spinning wheel.  Well, thought I, it’s just a blip in the system–not the first time my phone’s just cut out for a second.  So I reset my phone.  Then I looked and saw that my co-workers’ phones were doing the same thing.  Must just be something with our switch, I thought.  So I connected to the switch over a terminal session and checked the status of the VLANs.  Finding them to be all present and accounted for, I took the next logical step and reset the switch.  A couple minutes later, the switch was back up and running, but our phones were still out.

Logging in to the Voice box, I couldn’t see anything out of the ordinary, and the closest phone I could find outside of my office was fully operational.  Soon, I began getting reports that the phones, the wi-fi, and even the wired internet were down or at least very slow elsewhere on campus, though from my desk, I was still able to get out to the internet with every device available to me.  The reports, though, weren’t all-encompassing.  The middle school, right across a courtyard from my office, still had phones, as did the art studios next door, but the upper school was down, and the foreign language building was almost completely disconnected from the rest of the network–the few times I could get a ping through, the latency ranged from 666 (seriously) to 1200-ish milliseconds.

I reset the switches I could reach in the most badly affected areas.  I reset the core switch.  I reset the voice box.  Nothing changed.  I checked the IP routes on the firewall: nothing out of the ordinary.  Finally, in desperation, my boss and I started unplugging buildings, pulling fiber out of the uplink ports on their switches, then waiting to see if anything changed.  Taking out the foreign language building, the most crippled building, seemed like the best starting point, but was fruitless.  Then we unplugged the main upper school building, and everything went back to normal elsewhere on campus.  Plug the US in, boom–the phones died again–unplug it, and a minute later, everything was all happy internet and telephony.

We walked through the building, looking for anything out of the ordinary, but our initial inspection turned up nothing, so, with tape and a marker in hand, I started unplugging cables from the switch, one by one, labeling them as I went.  After disconnecting everything on the first module of the main switch, along with the secondary PoE switch that served most of the classroom phones, I plugged in the uplink cable.  The network stayed up.  One by one, I plugged cables back into the first module, but everything stayed up.  Then I plugged the phone switch back in, and down the network went again.

After another session of unplugging and labeling cables, I plugged the now-empty voice switch back in, hoping for the best.  The network stayed up.  Then I plugged in the first of the cables back into the switch.  Down the network went.  Unplug.  Back up.  Following the cable back to the patch panel, we eventually found the problem, missed on my initial sweep of the rooms: two cables hanging out of a phone, both plugged into ports in the wall.  For whatever reason, both ports on that wall plate had been live, and that second cable, plugged in out of some sense of orderliness, had created the loop that flooded the network with broadcast packets and brought down more than half of campus.

Take away whatever lesson you want from this story, but after working for almost four hours to find one little loop, I will think twice about hotting up two adjacent ports if they aren’t both going to be connected immediately and (semi)permanently to some device, especially if one of them is going to a phone.

Oh, yeah!  Pie charts, baby!

Oh, yeah! Pie charts, baby!

While I still haven’t gotten many chances to really put it through its paces, I really love a lot of aspects of the MR12 Meraki sent me.  One of my favorite features is the ability to get a lot of granular detail on the network traffic clients are getting through the AP.  There are places where raw data is fine, but a lot of the time, I want a nice visual representation just so I can get a quick idea of what I’m working with.  This is especially true when we have weird hiccups or slowdowns on our network in certain areas.  Unfortunately, I don’t have Meraki APs everywhere, so I can’t just pull up a lot of sexy data and quickly figure out what’s up.  I’d really love to be able to do that, though.

So what does a sysadmin do when there’s a need but not a solution he knows?  Google, duh.

And what does google give me?  It gives me ntop.  If you are a knowledgeable user and not just a luser, you have probably used top before to find out what processes are using the most of your memory and processor at any given time.  Well, ntop is something like that, only for networks.  But, more than that, it can give you nice graphical representations of your data through a web interface.

Having just run across ntop this very hour, I haven’t dived into the man pages for it yet, but I have written a long command chain so that I can read the things without standing in front of my Linux terminal, and, because sometimes I just want to write a long command, here’s what I did:

man -t ntop > ntop_man.ps && ps2pdf ntop_man.ps && rm ntop_man.ps && scp -P 22 ~/ntop_man.pdf [user]@[lappy]:/Users/[me]/Documents

So there.

(Of course you can expect me to post more about ntop as I dive in and find out what it can do for me and, by extension, what it might be able to do for you.)

So I’ve got two different cloud-controlled AP in my office right now.  The Meraki is now hanging up on our equipment rack through creative use of the provided mounting hardware, and the Aerohive HiveAP 121 is sitting on the workbench, waiting on the authorization email from Aerohive that will get me set up with their cloud controller.  I really like the idea behind these things.  We have a fairly large campus, and it’s expanding to across the street, which is miles away as far as I’m concerned, so I’m really interested in the idea of being able to manage our hardware from anywhere, whether I’m in the office, at home, or even (if the need were to arise) on vacation, as long as I had web access.  There are some things that I find bothersome, though.  Chief among those things is how many of my resources I have to make security exceptions for in order for these cloud-controlled APs to work seamlessly with our existing system.  Maintaining a level of security is part of my job description, so I don’t like the idea of putting a path between the web and the closed-garden systems on my network.

My most recent frustration in this regard cropped up today, just after I thought I’d sorted my previous problems out.  Our traditional APs are all controlled from an internal-only VLAN, and they sit on trunked lines that give them access to our other networks for the business of actually accessing the internet wirelessly.  That model, I’d already figured out, wouldn’t work right out of the box for the Meraki AP because there needed to be a path to the outside world so the cloud controller could talk to the AP, so I put the AP on a different network that can get out without a fuss.  But then: problems.

As I think I’ve mentioned before, our student and instructional networks authenticate clients against a RADIUS server that we have on campus.  Well, I know how to set up an AP to authenticate against a RADIUS server, so I thought there wouldn’t be any differences when it came to hooking the Meraki up to the thing.

What I didn’t know was that the cloud controller needs to access the RADIUS server, which would mean putting a hole in the firewall for that purpose.  I’m not particularly excited about that prospect.  I can see how it would be nice to have the RADIUS server accessible from the outside–we could, for instance, set up a remote AP at a totally separate site and let users authenticate normally.  Still, in the context that I’m using, or trying to use these APs, which is to say as nearly-pnp devices that I can just drop into my existing network with a minimum of fuss so I can put them through their paces to see whether I really want us investing in this technology (and in theory, I really would like to deploy these things).  In practice, this just hasn’t been working as I’d hoped.

In the end, I may just grin and bear it if I have to open some holes in order to make things work the way I’d like, but I really wish that there was an option to let these APs work with our existing RADIUS server without making that server available to the outside world.  After all, except for the cloud controller itself, which is in a server-farm somewhere (probably in California), everything I’m working with is inside one discrete network, so all the hardware involved should be able to talk amongst themselves.

Look what just showed up in the office today.

image

I can’t really say anything about performance, since the thing just showed up just as I was leaving, so I haven’t done much more than take it out of the box.  I can say that, while I will give it a fair evaluation compared with the other hardware I have to test, I’m a little bit more inclined towards the Meraki unit at the moment just based on build quality.  This Aerohive unit just doesn’t feel as solid, partly because it lacks the metal backplate of its Meraki counterpart, and the plastic mounting brackets that it came with feel a bit flimsy.

Well, watch here for more information after I’ve had a chance to actually put this thing through its paces.

Last week, I attended a webinar (I don’t like that word) presented by Meraki on the subject of BYOD solutions in a K-12 environment.  The presentation was interesting, certainly, but perhaps the best part was that, as a technology professional and as someone who makes purchasing recommendations for an organization with a budget for technology, I qualified for getting a free access point from Meraki.  Specifically, they sent me an MR12 access point and gave me a free several-year license for their cloud controller to go with it.

In case you don’t know Meraki’s deal, they make a full range of network hardware–APs, switches, and security appliances–which can be managed through the cloud from anywhere.  They also claim that, once you have your cloud controller set up, you can configure a new AP in as little as 15 minutes.  These are things that I like.  I also like getting free things to try out, so I was happy to get an opportunity to evaluate their hardware.

Simple packaging makes me happy.

Nothing superfluous here.

First of all, simple packaging makes me happy.  The box was just big enough for the AP, some mounting hardware, and a bit of padding to keep the thing safe.  No excess packaging.  There was barely any paper literature included in the box, either–just a single piece of paper listing the contents of the box and explaining, in short, that in this digital age, it’s stupid to print out documents that most people are going to throw away or lose anyway.  The access point itself seems to stick to this philosophy as well with a very spare aesthetic that I imagine would be next to invisible when deployed.

(Note, you don’t hear me talking in this unboxing video because I hate hearing my recorded voice and because narrating unboxing videos just seems awkward to me.)

So what happens after you get the thing out of the box?  Well, in my case, a lot of frustration to begin with.  My initial attempts to configure the AP as if it were any other of our APs, putting it on our control network with a static IP, putting it in bridge mode so that clients connected through it are on the LAN, and setting it up to authenticate against our RADIUS server, then making sure that the switch port it’s getting plugged into is associated with the right VLANs.

Then I plug it in and nothing works.  Sure, the power came on and I could see the AP on the cloud controller page, but the controller was reporting that the AP had never connected to the cloud controller, and the lights were flashing in a way that told me that it was getting a bad gateway.  That ain’t no good.

Well, I read through a lot of documentation, tried some things, and basically frustrated myself more working on it while during the afternoon when I kept on running off to attend to more urgent matters.

Wawa travel mug for scale (and because it was there first, and I'm not kicking it out of its home for some upstart AP).

Wawa travel mug for scale (and because it was there first, and I’m not kicking it out of its home for some upstart AP).

Today, after some more fiddling with the firewall and quiet swearing, I decided, what the hell, I’ll just plug it in to a regular network port, not trying to put the thing in Bridge mode, and just hoping that it would work as simply as I’d been promised.  Wouldn’t you know?  It worked.  It came on no problem.  The lights all went green (well, the power light started flashing orange after a bit, but that’s because it was updating the firmware for the first time, but after a reboot, it came back on no problem).  So now it’s sitting on my desk, plugged into the only network cable I had on hand, which is going through the switch in the back of my desk phone.  Maybe it’s not ideal, but I can finally start messing around with the cloud controller some more and really get an idea of what I’m working with and if I’m going to make the recommendation that we invest in Meraki hardware as we expand and upgrade our wireless.

Expect to see more about this hardware and probably some competing wireless solutions on here in the coming weeks.

As part of my efforts to expand and strengthen our wireless infrastructure on campus, I swapped out one of our aging Cisco Aironet 1200s for an Aironet 1250 last week.  In theory, this was fine and a great thing to do.  In practice, I got a call just as I was walking to work saying that classrooms in that area were reporting problems with their WiFi.

I checked the usual suspects, made sure that DHCP was running fine, made sure that the RADIUS server was up and running, and tried several times, in vain, to correct the presenting problem from the web-end.  At a loss, I went back and switched out the new AP for the old one just as a hold-over until I could figure out what was up.

Then, while doing the initial configuration for a new Meraki access point that I’m going to be testing out as soon as it shows up in the office, I realized what the missing piece of the puzzle was: the shared secret.

If you’re using a RADIUS server for wireless authentication, each client (access point) needs to have a shared secret that both it and the server know in order for any authentication to happen.  If the Aironet 1250 I had put in had been totally new to us, then I wouldn’t have run into this problem because I would have entered the shared secret during the course of my initial configuration, but this AP was formerly located elsewhere on campus, so all I had done was to change the IP address for its ethernet interface to prevent conflicts.

Now the AP had the correct Shared secret, and everything is as it should be again, but let this be a lesson to all of you: share your secrets.

image

Today’s unintentionally artsy-fartsy out-of-focus picture of server room blinkenlights brought to you by ny investigation into a stubbornly malfunctioning security camera.