Switching to Exchange Server in Six days

March 8, 2015

Yes. I switched our old mail server over to Exchange Server 2013 in six days. I had never touched Exchange Server before in my life. On Friday I began installing a new virtual machine for it and then on the following Wednesday evening I migrated our existing 90GB of e-mail over to it. Why? Because I am a complete moron and a glutton for punishment.

Actually everything went pretty good. The real reason I did such a rush job on this was I didn’t want to wait another month for another maintenance night window to do the migration. In fact, this migration was so much easier than expected I don’t really have much to explain, other than the recommendation that you not try to do it in six days. Take a little more time to learn the new product before rolling it out to your users. Then you can rest on the seventh day. Me? Not so much rest.

Coming from where?

Okay so for the longest time we have been running our mail on a custom built hodge-podge of open source software. Now, I love open source software. It has let us get a lot of things up and running that we otherwise wouldn’t have been able to do. The problem is, I’m the only one that knows how to run it. We’re not talking about an open source appliance like a pfSense box or a ready to deploy mail VM that is all configured and ready to go. We’re talking about a complete custom built solution – bad idea for the long haul.

Our mail system was a mix of dovecot, postfix, spamassassin, roundcube and mailman. A few months ago we moved our spam filtering off-site. The other four products were replaced by the single Exchange Server package. Again, I love those open source solutions, they served us well for many years. We have just grown too big. And lets face it, with Microsoft’s non-profit pricing it cost us a whopping $600 for 100 users – one time fee. Okay actually it cost us $640 because I also bought an Exchange book that I never actually read.

Going to what?

So I installed Exchange Server 2013 on top of a Windows Server 2012 Standard installation, which is tied to our (new) Active Directory domain. That alone took me nearly a day and a half. I had to install from a 3GB iso image and then it had to download 1.8GB of updates and reboot a half dozen times. Thankfully I could get in remotely from home so I kept clicking “next” and “reboot” while watching TV, that saved some serious time.

Anyway that is all on a Virtual Machine in our ESXi cluster with 2 vCPUs and 8GB RAM. I started with 160GB figuring the OS would take about 30GB, leaving 130GB for mail, logs, etc. Considering we only had 90GB of mail I thought that would be plenty. Wrong. Exchange must have some massive indexing going on. Before starting the import I indeed had around 130GB free. Towards the end of the import I (thankfully) noticed it was down to under 10GB free – oops. Happily I was able to extend the hard disk in VMware and Windows Disk Manager on the fly so I gave it another 40GB.

I’m not sure 8GB is enough for our user base (which is actually 85 users, the extra 15 licenses are spares / future use). The server is sitting at 6.8GB in use, only 0.9GB of which is cached data and only 1GB of free RAM. Committed memory is sitting at 12.8GB, which I assume is the amount of RAM + Swap space in use – so I might need to bump this thing up a bit.

This thing is a hog for CPU. This is running on a Dell R320 server with E5-2420 processors. Granted, those aren’t going to be blowing any records away, but they aren’t no slouches for CPUs either. Today is our quietest day in the office and the quietest day of the week for spam (for figure, people don’t like sending spam on Sundays). CPU is still averaging around 15-20% (100% means both vCPUs maxed out). 35-40% was the average for one of our regular work days. What Exchange server is possibly doing in the background when nothing is happening I don’t know.

You got there how?

Okay, so basically I installed the server and got it fully installed late Saturday. On Sunday I begin to setup all the accounts. If you are new to Windows Server do yourself a favor, buy a book on PowerShell. It’s not quite as easy as Bash scripting, but it’s still fairly easy to work with and makes things go a LOT faster. Since all my users already existing in Active Directory I was able to use PowerShell to automate the mailbox creation for all the mailboxes I needed.

After nearly the whole day getting the basic system functional and doing some basic tests, like sending an e-mail to myself to be sure mail was flowing, I needed a way to actually move all the mail from our old system to our new system. Now, a bulk migration system like this requires you to know everybody’s password. I happen to because our old system, which we just migrated away from last month, stored all passwords in cleartext. I will not lose any sleep over that hard drive being riddled with bullets when this is all done.

So there is this great little program called imapsync. Like all great software, it’s free! Okay, just kidding not all great software is free, but this one is. I think I read that there is a windows port out there somewhere, but frankly do yourself a favor and run this on a Linux box of some kind. Here is the gist of it. You provide it two servers, two usernames and two passwords. Two complete sets of credentials. A source set and a destination set. It copies everything from the source mailbox into the destination mailbox. It has lots of cool options, like allowing you to either leave messages on the destination that are not in the source, or deleting them so that the destination is in 100% sync with the source.

I used the delete option because I wanted the final destination mailboxes to be identical to their originals. It also includes a script you can use to do bulk migrations. So I setup a text file with the “username,password” pairs on each line and passed it into the script. Well, actually I did a manual run on my own mailbox first to test everything a few times until I found the right combination of commands:

./imapsync --host1 localhost --user1 daniel --password1 youwish \
--authmech1 PLAIN --tls1 --host2 192.168.4.8 --user2 daniel \
--password2 youwish --authmech2 PLAIN --tls2 --useuid \
--disarmreadreceipts --delete2

The hostX, userX, passwordX fields are the login server/credentials to use. The authmechX tells it what authentication mechanism to use and the tlsX tells it to use TLS (certificates are not verified) when connecting. The useuid parameter increases speed by using the unique message ID instead of headers to find duplicate messages. The disarmreadreceipts prevents the destination server from sending out “read receipts”, which would be bad since they have already been read years ago. Finally the delete2 option tells it to remove any messages from the destination mailbox that do not exist in the source mailbox. You can also use the “dry” option to do a dry run before actually moving any data, did that a few times.

So I fired off that command with my information and then walked off to talk to some people. Five minutes later I came back and it was, well, crawling doesn’t quite describe it. It was averaging about two messages per second, and these were tiny messages. After I finished all the migrations and was doing some trouble shooting for some minor problems on Thursday I realized that the default throttle settings were the culprit. Turn off your throttle settings temporarily while migrating! There is already a ton of articles telling you how to make these changes, so I will just tell you which two values I changed: ImapMaxBurst and ImapRechargeRate. I think you might actually only need to set ImapMaxBurst to unlimited. Once it is unlimited I don’t think it would ever need to recharge.

Anyway, as I said I didn’t realize that until after I finished all the migrations. So, seeing how long that was going to take (no way was that getting done in one night like I had planned) I broke the user list text file into three and fired up the auto-run script on each of the three files. This way it would migrate three mailboxes at once. Sometime well into the next day it finished. After it finished I ran it again. And I kept running it every few hours over the next two and a half days. When it came to do the final migration I only had to wait about 2 hours for it to run.

I also discovered the day after the migration that some folders had not been migrated. No idea why since I must have run each mailbox at least 6 times. After the first person mentioned they were missing a few mailboxes I ran the script again with the extra options ‘–dry –justfolders’. This told it not to actually do anything, just tell me what was missing and only operate on folders, not the contents thereof. Well I found 3 mailboxes that had missing folders. One was missing two folders, another about five folders and the last was missing close to fifteen folders. A manual run of imapsync on each of those mailboxes got everything fixed up. You can also pass it ‘–folder MyFolder’ (or multiple instances of –folder) to sync only the folder(s) specified rather than everything. I had to do that because I didn’t want it to sync stuff back to their Inbox they had already deleted.

The final thing to do that night was swap the DNS. I pointed DNS at the new server (by the way I used a firewall rule to block outside access to the mail server while I migrated). In the morning when everybody came in their Mail worked – sorta. Everybody got their Mail to show up for a few seconds and then everything vanished while it spent the next few hours syncing everything again (this is when I noticed the throttling). Now, to be fair most everybody had a usable Inbox after about 5 minutes, but because of the throttling it ran very slowly while about 50 computers all synced at once. At this point I did try turning off the throttling and that just broke the server completely. I turned it back on but gave it a higher limit, I think I used 480,000 for ImapMaxBurst instead of the default of 60,000.

As I understand it those numbers basically mean how many milliseconds of CPU can be used in a 60 minute window. The default of 60,000 means each user can only use 1 minute of processor time every 60 minutes. Yea that didn’t fly to well when trying to sync a 1.5GB mailbox. At 480,000 that meant 8 minutes every hour of processor time.

So if you are going to do something similar, because of the “mail flash” (the reason for this is it detects that all the messages it knew about are gone, removes them and then re-downloads them from the new server), I highly recommend that you notify your users ahead of time what is going to happen. They need to give mail about 10-15 minutes after launching it for it to sync up enough to be happy.

And on the seventh day…

Okay so I didn’t actually get to rest on the seventh day. But the eighth day, the eighth day I slept like a baby. The day after the migration (day seven) I spent most of the day fixing minor problems. A forgotten mail alias here, a confused iPhone there. All-in-all, everything worked out very well. Most everybody was able to get work done with only a few minor complications and delays from the server being overloaded.

Now, to be fair I only moved the SMTP, IMAP and mailing list stuff over. Exchange Server also provides calendaring which still needs to be moved over. But we only have about 15 people using our central calendar server so I’m just going to move them over by hand. Because of all the calendar sharing and what-not I think it would be far easier than trying to do any kind of bulk transfer and then having shared calendars not work anymore. I’m sure I will be tweaking the install for weeks to come as well, but the system is up and running and surprisingly stable for an über-fast install by a novice.