1. My Experiences With DFS Replication on Windows 2008 R2

    Sun 15 June 2014

    If you are considering implementing DFS replication, consider using Windows 2012 R2 because DFS replication has been massively improved. It supports larger data sets and performance has dramatically been improved over Windows 2008 R2.

    I've implemented DFS replication to keep two file servers synchronised. Click here if or there you want to learn more about DFS itself.

    With DFS, I wanted to create a high-available file server service, based on two file servers, each with their own physical storage. DFS replication makes sure that both file servers are kept in sync.

    If you setup DFS, you need to copy all the data from the original server to the secondary server. This is called seeding and I've used robocopy as recommended by Microsoft in the linked article.

    Seeding is not mandatory. You can just start with an empty folder on the secondary server and just have DFS replicate all files. I've experienced myself that DFS replication can be extremely slow on Windows 2008 R2.

    Once all files are seeded and DFS is configured, the initial replication can still takes days. Replication times are based on:

    1. the number of files
    2. the size of the data
    3. the performance of the disk subsystems of both source and destination

    Note: windows 2012 R2 improves DFS replication dramatically, only more reason to upgrade your file servers to 2012 R2 or higher.

    If you seed the files, DFS will not transfer files if they are identical, thus this saves bandwidth and time. DFS checks if files differ based on their hash. So even if you seed all data, the initial replication can take a while.

    On our virtualised platform, the initial replication of 2.5 GB of data consisting of about five million files took about a full week. To me, that is not a very desirable outcome, but once the initial replication is done, there is no performance issue and all changes are nearly instantly replicated to the secondary server.

    For the particular configuration I've setup, the performance storage subsystem could contribute to the slow initial replication.

    To speed up the replication process, it's important that you install the latest version of robocopy for Windows 2008 R2 on both systems. There is a bug in older versions of robocopy that do not properly set permissions on files. This results in file hash mismatches, causing DFS to replicate all files, nullifying the benefit of seeding.

    Hotfixes for Windows 2008 R2: Hotfixes for Windows 2012 R2:

    To verify if a file on both servers has identical hashes, follow these instructions

    If you've checked a few files and assured that the hashes are identical, it's ok to configure DFS replication. If you see a lot of Event ID 4412 messages in the DFS Replication event log, there probably is an issue with the file hashes.

  2. How Traffic Shaping Can Dramatically Improve Internet Responsiveness

    Sat 08 March 2014

    At work, access to the internet is provided by a 10 Mbit down / 1 Mbit up ADSL-connection. As we are a mid-size company, bandwidth is clearly a severe constraint. But it was not our biggest problem. Even simple web-browsing was very slow.

    As I was setting up a monitoring environment based on Nagios and pnp4nagios, I started to graph the latency of our internet connection, just to prove that we have a problem.

    Boy did we get that proof:

    bad latency

    Just look at the y-axis, which scale is in milliseconds. For most of the day, the average latency is 175 ms, with some high spikes. Just browsing the web was a pain during times of high-latency, which was clearly almost all of the time.

    I became so fed up with our slow internet access that I decided to take matters in my own hands and resolve the low latency issue. The solution? Traffic shaping.

    I learned that as ADSL-connections are saturated, especially their upload capacity, you will experience high latency and packet loss. So the trick is to never saturate the connection.

    I grabbed a Linux box with two network interfaces and placed it between our internet router and our firewall in bridge mode.

    For actual traffic shaping I used wondershaper which is part of Debian or Ubuntu.

    apt-get install wondershaper
    

    The wondershaper script is extremely simple, it's specifically build to resolve the problem we face with our ADSL connection. It not only prioritises traffic, it allows you to limit bandwidth usage and thus prevent you from saturating the connection.

    This simple example limits bandwidth a bit below full capacity, which dramatically improved latency.

    Syntax:

    wondershaper <interface> <rx> <tx>
    

    Example:

    wondershaper eth1 9500 700
    

    As you can see, latency improved dramatically:

    good latency

    Again, look at the y-axis. We went from an average latency of 175 ms to an average of 35 ms. That's quite an improvement.

    Can you spot on which day I implemented traffic shaping?

    week latency

    At the time of writing this blog post, the company is working on fiber internet access, resolving our internet woes, but it will take quite some time before that will be installed, so this is a nice intermediate solution.

    Tagged as : traffic shaping
  3. Monitoring HP MSA P2000 G3 I/O Latency With Nagios

    Tue 04 February 2014

    At work we have a couple of HP MSA P2000 G3 SANs. These are entry-level SANs that still seem to have almost all features you might want from a SAN, except for official SSD-support.

    It seems that the new MSA 2040 adds support for SSDs and also provides 4 GB cache per controller instead of the somewhat meager 2GB of the P2000.

    Anyway, a very nice feature of the MSA P2000 G3 is the fact that the management interface also provides a well-documented API that allows you to collect detailed stats on subjects like:

    • Overall enclosure status
    • Reports on failed drives
    • Controller CPU usage
    • IOPs per controller
    • IOPs per vdisk
    • IOPs per disk

    Thomas Weaver has written a Nagios plugin that does that: it collects this information and in turn you can graph it with pnp4nagios.

    In more recent firmware updates HP has added support to monitor read and write I/O latency per vdisk. Latency is an important indicator for application-level performance so I was quite happy with that.

    As the plugin by Thomas did not support reading these parameters yet, I spend some time implementing this check and submitted this new version of check_p2000_api.php back to Thomas.

    Read Latency of a RAID 6 array

    readlatency

    Write Latency of a RAID 6 array

    readlatency

    You will notice that the write latency of this disk array is very high at times, which seem to indicate that this vdisk is taxed too much with too many I/O-requests.

    I'd like to thank Thomas Weaver for writing this plugin, I think it's very useful.

    Tagged as : Storage Nagios

Page 21 / 73