Skip navigation

With all the net neutrality stuff circulating around, and in my position as netadmin/sysadmin at my school, I’ve started thinking about teaching a civics lesson on the subject.  I mean, I won’t, obviously, because I like my job, but that doesn’t stop me from thinking about it.

We have separate networks for students, teachers, and guests on campus, and which networks you can get on is dictated by what your MAC address is associated with in our RADIUS database.  Now, it wouldn’t be too difficult for me to set up a “Premium” student network, then throttle the bandwidth available to students on the standard student network.  Heck, with a bit of work, I could even block access to sites like facebook to those on the standard student network.  Then, I just ask that students who want access to the “Premium” student network just pay me $5 for the privilege.

The other reason I wouldn’t do this, obviously, is that I’m not a villain (and I don’t think there are many students, at least who are at or near voting age, who aren’t already for net neutrality).

If you’d like to make your voice heard, the Electronic Frontier Foundation (EFF) has put together a nice tool over at to submit your thoughts on the public record about the FCC’s proposal.

Security: it’s hella important, yo.

I’m not going to try to get into all the technical details of Heartbleed–the OpenSSL team covers it a lot better than I probably ever could.  I’ll just say this: if you run a web server, and if you deal in any secure traffic, then do yourself and your users a favor and check to see if your version of OpenSSL is vulnerable to this MASSIVE security hole.  Various script-kiddies flocking together under the name of “Anonymous” are already gleefully distributing versions of the exploit code (which I have lots of opinions about, but now is a time for action, not for yelling about punks), but the exploit has been in the wild for two years now.

You can run your own tests on your servers, but the easiest way to get a little peace of mind is to run tests from here.

Do the responsible thing.  Keep the web safe for everyone.



From time to time we realize that sayings which make sense to us probably will have no meaning for future generations. Two of the examples that spring to mind are “hang up the phone” or in a vehicle you might “roll down the window”. And so is the case for today’s Retrotechtacular. Linux users surely know about TTY, but if you look up the term you actually get references to “Teletypewriter”. What’s that all about?

[Linus Akesson] wrote a fantastic essay on the subject called The TTY Demystified. We often feature old video as the subject of this column, but we think you’ll agree that [Linus’] article is worth its weight in film (if that can be possible). The TTY system in Linux is a throwback to when computers first because interactive in real-time. They were connected to the typewriter-mutant of the day known as a teletype machine and…

View original post 100 more words

After the revelatory nature of the information I shared earlier this week, I felt on top of the world, but that illusion quickly shattered when I attempted to upgrade some of our newest (but still autonomous) access points, only to have my tftp requests time out.  A quick ? showed me that I could instead use scp (which has made appearances on this blog before), but the syntax was left as a mystery to me.  I have finally found the syntax, though (hint: it’s not quite the same as the normal *nix command) and have had considerable success upgrading our remaining autonomous units with that method.

Whereas with tftp, you simply entered the server address followed by the path to the file (relative to the tftp server folder), the Cisco version of scp is a bit more complicated.  My main tripping point was discovering what the file path for the image being downloaded was relative to.  I assumed it would start at the root of the filesystem, / but instead the path is expressed relative to the home folder of the username specified.  I don’t know if using ../ will let you back out of your home folder, but it’s simple enough to copy the image to your home folder.  So, to use scp to download an image from your machine to a Cisco access point, you would use

archive download-sw /reload /overwrite scp://username@server/path/to/image.tar

where the image path is relative to the home folder of username.

That’s all.  Happy scp-ing!

A quick post today, but no less informative, I hope.

We just installed a brand new Cisco wireless controller, and that means converting our older, autonomous access points to lightweight mode so they can interface with the controller.  Cisco would like you to use their (Windows-based) tool, which I tried initially.  While it may be easier and faster in an ideal situation, those are so rare.  I looked around but couldn’t find a good text-based tutorial for doing the upgrade, but I did find some youtube videos, one of which brought me my solution.

Before you get started, you’ll want to collect a few things.  First, you’ll need a recovery image specific to your access point.  You can download the image you need from Cisco–you’ll want the recovery image, which will crucially contain the string “rcv” in its file name.  Download the image and move it to your tftp server root (if you’re using Ubuntu, there’s a good guide for setting up a tftp server here if you don’t already have one).  Don’t worry about extracting the tarball–the access point will handle that for you.

If you’re not upgrading your access points in place, you may also want a serial connection to the AP so you can watch its progress the whole time, but this is optional.  I use minicom for my serial terminal on Ubuntu, though you may already have a package you prefer.

Now that you’ve got everything in place, telnet (or ssh) into your access point (and enter enable mode, but not configure mode) and run the following:

archive download-sw /reload /overwrite tftp://(ip address of your tftp server)/(name of recovery image tarball)

After the access point finishes downloading the image, it should restart automatically, but if there are any unsaved changes lingering on the system, use reload to restart the switch.  The switch will reboot, and if you’re watching on your serial console, you should see the access point going through the process of loading the recovery image, contacting the controller, and then downloading a full image before finally restarting again and coming under full control.

Backups, as far as many folks are concerned, are one of the most boring topics imaginable. Of course, that only extends to the point at which you need them; then they’re still boring, but you really really hope you have them. Most of the time, backing up your data is simple–if you’re at home, get an external drive, plug it into your machine, and make sure you back things up regularly.  If you’re at work, things often get a lot simpler than that: someone from IT provides you with a backup solution or even makes it happen without your needing to intercede. At work, backups become something of an assumed thing (provided that you’re not the person in IT who’s administrating them). Things only start to get sticky when you take that work computer home.

Imagine a situation in which your employer has issued you a laptop for work (not too hard to do, I should think). Because it’s portable, and maybe because it’s a better machine than what you own, you use it for non-work purposes, loading it with personal documents, photos, music, maybe even games. Depending on your organizational ethic, some of these personal files may be mixed in with your work documents (which are often considered to be the property of your organization, though your mileage may vary from this hypothetical situation). With your personal files mixed in with work files, your employer is now spending disk space on backing up non-essential (from their point of view) files; this is also probably known as wasting that space inside the walls of your tech office.

Yes, storage is cheap, but implementing that storage still takes time, which raises the question of how to deal with the issue of backing up only the files which your company can claim ownership over. Yes, you can issue backup drives to employees and put the onus of backups on them, but that raises its own set of problems.  User error happens, and often even the most vigilant of users will forget to make that backup at some critical juncture. Even trusting your users and stressing to them that they are responsible for their own backups, there are some users whose files are too important in one way or another to trust only to their own backups.  Too many users carry their backup drive, if it’s easily portable, in the same bag as their laptop, thus ensuring total loss of data in the event of a theft.

Cloud storage would seem to be a solution: just drop your files into a special folder, and they’re automagically available anywhere you have internet.  The problem comes back to separating personal data from work data, specifically in the event of employee termination or retirement. Not all cloud storage is created equal, and the most well-known solutions are mostly consumer/individual-oriented rather than enterprise-oriented. For Google Apps organizations, Drive is a fine solution, though it has its limits–permissions are nowhere near as fine-grained as you would find in a Windows domain or a UNIX-like filesystem, which presents a problem in more security-oriented organizations where a data breach could have serious consequences. If data is on a Drive account managed by a company’s domain, it is still recoverable if an employee leaves. This is not possible, though, with personal Google accounts, Dropbox, or other such services.

In the end, there isn’t any one easy solution–a robust data security/recovery strategy requires several levels and different considerations for different groups of users. No two organizations are likely to have exactly the same needs, but the questions should be largely the same.

Ok, so here’s the deal: you want to monitor all your network infrastructure with Nagios (or something else, but these instructions are geared towards Nagios specifically), but you’re new to the program and so you’re starting with the little things first, like collecting system up-time on your switches.  The problem is that you’re just seeing service timeouts where you would like to see that up-time.  Fear not!  I have done the leg-work (or click-work) and present you with the answer to your problem in two parts.

The Switch

The first part is to get your switch set up to use SNMP version 3 (much more secure than 1 or 2c).  Log in to your switch and run the following

conf t
snmpv3 enable

At this point, you’ll have to set up an initial user. Don’t sweat it–you’ll be deleting this user in a minute, so you can just enter junk information here.

snmpv3 only
snmpv3 user "username" auth sha "password" priv des "password"
snmpv3 group operatorauth user "username" sec-model ver3
no snmpv3 user initial
wr mem

Now SNMP is enabled on your switch, you can move over to the Nagios end of things.


Note: I’m using NConf to configure Nagios, and these instructions assume that you’re doing the same.  Wherever your snmp_check command is configured, set it up to execute with the following parameters:

snmp_check!-P 3 -a sha -U [username as set above] -A [password as set above] -L authNoPriv -o

Now just reload the Nagios service, and the next time a service check is run for up-time, you should get a nice number telling you how many days the switch in question has been up.

Giving credit where it’s due, the SNMP configuration information came from here (where you can also find information on how to configure Cisco switches).

When I took on the role of Systems and Network Administrator at my work, we had been using a Linux-based software firewall as the backbone of our network.  In fact, up until about seven weeks ago, we’d been running the same hardware box for around seven years.

Then it crashed.

Luckily, the crash was caused by a problem with the motherboard, and we had a recently-decomissioned server running on the same hardware that we could just swap in without too many problems (though I did spend most of a weekend at work getting everything back up and running).  After we got the firewall back up and running, though, more and more problems started coming out of the woodwork.  Our RADIUS-based MAC authentication was spotty sometimes, and whole classes were unable to access the network.  It was clear that something had to change, but until we could isolate the problems, we couldn’t even start.

Consultants were consulted, outside eyes looked over our infrastructure, and there was an “aha!” moment.  The Linux firewall, the heart of everything, had become insufficient.  Every year, we’ve added more devices, and with the pilot of a one-to-one iPad program in our middle school, we had hit the breaking point.  If our firewall had only been doing firewalling and routing functions, we might have been able to go on for another year, but with iptables, RADIUS, squid caching, routing, and DHCP all running on the same box, with pretty much all of our traffic making several trips through the one internal interface, the system bus, and asking for CPU clock cycles on our firewall, there was no way that we could sustain the model indefinitely.

So what did we do?  We made a major overhaul of our core infrastructure, moving different services to different hardware.  You can (for a pretty penny) get switches that do both layer-2 switching and layer-3 routing at line speed.  We had a firewall appliance that had never been fully deployed before precisely because it takes a lot of work to break out all the services we had running on our firewall and keep everything running smoothly without the end-user noticing a change.  Of course, with such a big change on an inherited network, there are things that didn’t get caught right away, but that always happens.  After some late nights, our network has smoothed out to the point that I’m not just putting out fires constantly.

But where does this leave the Linux firewall?  While I have a working, if somewhat limited knowledge of Cisco switching, wireless, and internet-telephony solutions, their security appliances and layer-3 switches are mostly foreign to me.  I won’t claim that I’m an expert with iptables, but I knew my way around the command well enough to maintain things.  But the question is larger than this one case.

For small and even medium businesses, a Linux firewall is probably still the best, most economical choice if you have a serious network, as long as you have or are willing to gain the appropriate Linux wizardry, with the caveat that that box should only be doing firewalling and routing.  If you have other services that you need to run on your network, put them somewhere else, especially if you’re running something like RADIUS, where timely response packets are required for authentication.  However, if you’re supporting many hundreds of devices across multiple VLANs and expect to expand even further, a hardware-based solution will be a better investment in the long run, even if it’s a greater initial expense.

Tower defense games: everyone’s played them in some form, they’re pretty ubiquitous in this touchscreen world.  At lunch today, I hit upon a new twist that I don’t think has been tackled before: the WiFi needs of users at a large organization (in this case, a school).

Now imagine that they want to kill you because they can’t get on Facebook to post another twelve duckface selfies.

The idea is that you, the player, are the sysadmin on a school campus.  You have a budget for network infrastructure, and you need to keep the WiFi running and able to meet the demands of the users as they bring in more and more devices.  You can get status reports from various parts of your user base much like you could get reports from your cabinet in Sim City 2000, and your life bar would tick down and more and more users stream into the always-open door of your office.  Your campus would start small but eventually expand as you got more users (students).  The construction of buildings would, of course, affect the signal of your access points, and you would also have to make sure that you had enough switchports and network drops available to connect both APs and wired users.

The best part?  It teaches while you learn!  Balancing user needs against a budget and battling all the issues that can come from a mix of WiFi infrastructure and user devices and needs is a useful skill.  Will you force an SSID to go entirely to 5GHz at the risk of crippling a cart full of ancient laptops that only have a 2.4GHz antenna?  What do you do when you start piloting a one-to-one program?

So my call to you, fair readers, is to help me make this game a reality.  I’m not a game designer, but I am a sysadmin, so I am familiar with a lot of scenarios, and, with my training as a writer and my general nature, I have ideas and opinions, which I would be happy to force upon share with you if you decide to jump on board.

I also have a GitHub repo for the project.  I hope that you can help me make this project happen.

Yesterday, I got a first-hand demonstration of how a simple, well-meaning act of tidying up can have far-reaching consequences for a network.

Our campus uses Cisco IP phones both for regular communication and for emergency paging.  As such, every classroom is equipped with an IP phone, and each of these phones is equipped with a switch port, so that rooms with only one active network drop may still have a computer (or more often a networked printer) wired in.  If you work in such an environment, I hope that this short tale will serve as a cautionary tale about what happens when you don’t clean up.

I was working at my desk yesterday afternoon, already having more than enough to do, since the start of school is only a few days away, and everybody wants a piece of me all at once.  While reading through some log files, a bit of motion at the bottom of my vision caught my attention: the screen on my phone had gone from its normal display to a screen that just said “Registering” at the bottom left with a little spinning wheel.  Well, thought I, it’s just a blip in the system–not the first time my phone’s just cut out for a second.  So I reset my phone.  Then I looked and saw that my co-workers’ phones were doing the same thing.  Must just be something with our switch, I thought.  So I connected to the switch over a terminal session and checked the status of the VLANs.  Finding them to be all present and accounted for, I took the next logical step and reset the switch.  A couple minutes later, the switch was back up and running, but our phones were still out.

Logging in to the Voice box, I couldn’t see anything out of the ordinary, and the closest phone I could find outside of my office was fully operational.  Soon, I began getting reports that the phones, the wi-fi, and even the wired internet were down or at least very slow elsewhere on campus, though from my desk, I was still able to get out to the internet with every device available to me.  The reports, though, weren’t all-encompassing.  The middle school, right across a courtyard from my office, still had phones, as did the art studios next door, but the upper school was down, and the foreign language building was almost completely disconnected from the rest of the network–the few times I could get a ping through, the latency ranged from 666 (seriously) to 1200-ish milliseconds.

I reset the switches I could reach in the most badly affected areas.  I reset the core switch.  I reset the voice box.  Nothing changed.  I checked the IP routes on the firewall: nothing out of the ordinary.  Finally, in desperation, my boss and I started unplugging buildings, pulling fiber out of the uplink ports on their switches, then waiting to see if anything changed.  Taking out the foreign language building, the most crippled building, seemed like the best starting point, but was fruitless.  Then we unplugged the main upper school building, and everything went back to normal elsewhere on campus.  Plug the US in, boom–the phones died again–unplug it, and a minute later, everything was all happy internet and telephony.

We walked through the building, looking for anything out of the ordinary, but our initial inspection turned up nothing, so, with tape and a marker in hand, I started unplugging cables from the switch, one by one, labeling them as I went.  After disconnecting everything on the first module of the main switch, along with the secondary PoE switch that served most of the classroom phones, I plugged in the uplink cable.  The network stayed up.  One by one, I plugged cables back into the first module, but everything stayed up.  Then I plugged the phone switch back in, and down the network went again.

After another session of unplugging and labeling cables, I plugged the now-empty voice switch back in, hoping for the best.  The network stayed up.  Then I plugged in the first of the cables back into the switch.  Down the network went.  Unplug.  Back up.  Following the cable back to the patch panel, we eventually found the problem, missed on my initial sweep of the rooms: two cables hanging out of a phone, both plugged into ports in the wall.  For whatever reason, both ports on that wall plate had been live, and that second cable, plugged in out of some sense of orderliness, had created the loop that flooded the network with broadcast packets and brought down more than half of campus.

Take away whatever lesson you want from this story, but after working for almost four hours to find one little loop, I will think twice about hotting up two adjacent ports if they aren’t both going to be connected immediately and (semi)permanently to some device, especially if one of them is going to a phone.