If you work with Fibre Channel on a regular basis and have come to depend on its excellent performance attributes, you might be surprised to learn that many of the cases handled by our SAN team in Customer Support are related to performance problems. Typically, as we dig into these cases, we discover the majority of them are not actually caused by the FC switches themselves, but rather by end devices that are not performing as expected and are causing fabric wide congestion. These misbehaving end devices are referred to as “slow drains” and they represent a well-known class of problem that is inherent to all lossless transports including FC, DCB Ethernet (e.g., FCoE, RoCE) and even wastewater drain systems (e.g., “Plumbing”).
For example, let’s assume that we have a pipe that is capable of transporting 16Gps (Gallons per second) but an outlet that only allows 8Gps. As long as the amount of water flowing through the pipe is less than or equal to 8Gps, the water will flow out of the pipe at approximately the same rate as it flows into it. As you would probably expect, since there is 16Gps of capacity in the pipe, the rate at which water flows into the pipe could exceed 8Gps for short periods of time and the pipe could act as a kind of buffer. You would also probably expect that if you were to consistently exceed an average of 8Gps, you would quickly notice the water level in the pipe starting to rise.
The same sort of problem can happen within a FC SAN for a number of reasons, including:
It’s important to note that in all of the above scenarios, Buffer-to-Buffer flow control is used to ensure that there is sufficient buffering available on the receiver to store all frames put onto the wire by the transmitter. As a result, the transmitter experiences congestion whenever it is unable to transmit a frame due to a lack of buffer at the other end of the link. For example, as shown below, congestion has spread from the Target back towards the Initiator due to a mismatch in data rates.
The impact that the problem being shown above will have on the rest of the fabric might not be obvious at first glance but let’s see what happens with a slightly more complicated topology. I’ll start by adding another Initiator and Target pair (e.g., Initiator 2 and Target 2). Note that the receive queue on switch A contains 3 frames, one for Target 1 and two for Target 2. Since there is ample space in the receive queue on switch A, the Initiators attached to Switch B are probably not experiencing congestion very often.
However, if I slow down the rate at which Target 1 is pulling frames from its queue, you notice that something very interesting happens as shown below.
Note that Target 1 appears to be receiving frames at a rate faster than it can process them and as a result its local queue is filled. This in turn causes the receive queue on switch A to fill with frames destined to Target 1 since the frames destined to Target 2 are transmitted soon after they are received. In extreme cases, such as the one shown above, this congestion can spread all the way back to the Initiators and prevent them from transmitting frames. When things get to this point we typically see two basic scenarios:
While the above scenarios may seem a bit extreme, they can and do happen in our customer’s environments. As you can imagine, when these events occur they can be very painful because they have such a wide impact and can be somewhat hard to troubleshoot. In addition, the reason that a particular end-device (both Initiators and Targets) act as a slow drain is not very well understood at this point in time. As a result, we decided to come up with a process to help identify, troubleshoot, resolve and prevent slow drain devices; an overview this process can be found in EMC KB 464027 and is shown below for reference.
Our hope is that our customers will use this process to help them detect the presence of potential slow drain devices before they become a significant problem and impact their environments. There's also a ton of background information and links to deep dives on the different concepts involved in case you need it.
Special thanks to Dennis Makishima and Howard Johnson from Brocade for all of the work they have done to get to the bottom of the slow drain issues being observed and enabling us to start addressing these issues.
Thanks for reading!
Although this blog post isn’t an April fool’s joke, I do greatly appreciate them when they are done well. As a result I’m slightly in awe of the amount of thought Tony Bourke has put into his last three (2013, 2014 and 2015). Truth be told, I'm actually kind of honored that he sees me as the poster child for FC/FCoE (at least based on his tweets from April 1st). To save you a bit of reading, his posts all rest upon the idea that FC is yesterday’s technology, has one foot in the grave, and is desperately trying to prevent itself from being swallowed by the stork by any means necessary.
While Tony’s posts were all done in good fun and have given me new respect for the term getting Bourked, I will say that the idea of “FC is dying” is becoming a common theme in the industry; here are a few examples:
I had many fundamental issues with this article and I plan to do another post shortly that explains why RoCE, and even the new darling of the Ethernet tribe “Routable RoCE”, won’t kill off FC anytime soon.
While I do not want to downplay the importance of Server SAN, and I agree that its use in hyperscale environments clearly demonstrates its ability to scale, I think it’s a bit of a stretch to say it will displace all array based storage (and hence the use of FC) in the timeline Wikibon has been promoting. The fundamental problem I have with their line of reasoning is this; they're comparing the current state of a known entity (e.g., Enterprise Storage Arrays) with an approach (i.e., Server SAN) that is theoretically possible but has yet to reach the peak of its hype cycle and as a result, is still benefiting from the “Peak of Inflated Expectations”. I say this somewhat tongue in cheek, but feel passionately that anyone who thinks Server SAN is a fully baked technology is fully baked. In addition, the wikibon analysis seems to assume that the storage array vendors will “go gently into that good night” and not introduce any new array based technologies to differentiate themselves; I believe this is a fundamental flaw in most people’s logic when talking about this topic. As a matter of fact I was quite taken with the DSSD demo that Chad Sakac and Bill Moore put on at EMC World.
It may be early days for the DSSD folks, but I really like where they seem to be headed. The important point is this; they’re already providing something in an array form factor that cannot currently be implemented using the Server SAN approach. How does this relate to FC? Well there’s talk of running NVMe over FC (i.e., FC-NVME) and what nobody seems to be realizing is that many of the features that the NVMe over fabrics group have expressed an interest in (e.g., a “Name Server”, reliable transport) are already fully supported by FC and have been for 15+ years. From where I’m standing it seems like FC has a leg up on the competing transports here, especially if you don’t want to place your bets on iSNS.
BrassTacks - “An Introduction to Virtual Storage Networks”
Yeah, about that… I guess I’m guilty of promoting the idea that the future of storage connectivity is all Ethernet too, and I still think it probably is, but only for platform 3 Applications. Why not platform 2 or 2.5? Read on…
At EMC World 2015, during the “FC SAN, Should I stay or should I go?” Birds of a Feather (BoF) session on May 6th, I had the opportunity to speak with a group of 400+ EMC Customers. To start the session off, I pointed at the title of the presentation (FC SAN, Should I stay or should I go?) and asked:
“How many of you are here because you’re asking yourself this question?”
Answer: 90+% of the attendees
I then asked “How many of you use FC as your primary means for accessing external storage”
Answer: 90+% of the attendees.
Next “How many of you use IP Based storage?”
Answer: about 30%
And then “How many of you see moving to an IP based storage protocol (iSCSI or NAS) for your primary means of accessing your external storage capacity in the next 18-24 months?”
Answer: 3 people!
Finally “How many of you see moving to an IP based storage protocol (iSCSI or NAS) for your primary means of accessing your external storage capacity in the long term?”
Answer: 5 people!
I found this answer perplexing because we’ve been hearing from some big Enterprise accounts as well as many in the networking industry about “everyone’s plans” to move to a converged LAN and IP SAN. In fact, given the current amount of hype around this topic, I assumed that there would be a significant amount of actual storage administrators in the BoF that would be in the middle of planning this move and had structured the BoF presentation accordingly. However, during the BoF, instead of the active "give and take" kind of conversation about IP Storage Networks that I was hoping for, I got the same reaction as I had at previous EMC World BoF Sessions when I talked about IP Storage; crickets and tumbleweeds…
Nevertheless, since I had prepared to talk about IP Storage Networks, I spent the better part of the next 20 minutes trying to figure out what it would take to move them towards a converged IP network. At every opportunity my audience was giving me kindly worded feedback which I blew past because I was waiting to hear one of the “Actual meaning in American” phrases.
Once I finally got the hint that they were not interested in IP SAN, thankfully it only took 20 minutes and a redirect from the Director of the Connectrix BU, (I can be stubborn), what started to emerge was a very interesting picture that was, in hindsight, consistent with the responses I’ve received over the past few years especially when talking to these customers about Infrastructure automation. The key questions and answers that tied all of this together for me were:
“How many of you are running OpenStack or some other Infrastructure Automation tool in production”
Answer: 2 people!
“How many of you are evaluating OpenStack or some other Infrastructure Automation tool”
Answer: Less than 10% (~30 people)
When I combined this with:
The first class of users consists of those that are being forced to evolve and provide services to their customers in an on demand fashion. A good example is the EMC Rubicon team that presented at the devops conference at EMC World. The kind of use case they need to support isn’t just do more with less, it’s do everything with no human involvement, in zero time, for as little as possible. Yeah, I know, as technologists we’re being force fed the idea that everyone involved in IT falls into this category and will eventually need to be like “Amazon” or “VMware vCloud Air”. And if you aren’t working on this right now you’re a loser and you’ll be out of a job by next year. And if your product team doesn’t have a long term plan to ship your products as collections of container based micro services that are all automatically installed and updated by Puppet and validated continuously by Jenkins which is necessary because you’ll be releasing / deploying new versions of your product every millisecond onto your customer’s OpenStack Zebra instances which doesn’t need to be reliable because fault tolerance is built into your application because well applications are just a bunch of cattle anyway, right?????
….Not that there’s anything wrong with this new approach, again, I think it represents the future for some types of applications. But right now, this type of automation (especially infrastructure automation) is in its very early stages, still really more of a framework than something that you can take out of the box and setup by yourself (without a ton of knowledge AND effort). Actually, strictly from an end user point of view, OpenStack specifically kind of reminds me of a very early, sparsely documented Linux distribution from 15+ years ago. That having been said, I think EVENTUALLY it (or something like it) will be just as important to infrastructure as Linux is to compute.
In any case, this new approach has a ton of potential, but it’s unstable and making infrastructure stable is really hard to do especially if you need to do so in an automated fashion. The example I think of is; how to automate the creation of a completely redundant path to your storage array? (Just noodle on that one for a few minutes, if you figure it out let me know, we’re hiring). All of this means that running platform 2 Applications in this kind of environment would be challenging because they need stability, because for the most part they rely on the underlying infrastructure to provide redundancy.
This leads me to the second group of people. Perhaps unsurprisingly they’re the people who need to support these second platform applications and who I am now affectionately referring to as the “Fibre Channel Tree Huggers”.
These FC Tree Huggers are people who value “Stability”, “High Availability”, and “Predictable performance” and also know that if something happens to one of the applications under their control, it could have significant ramifications not only financially but legally. Keep in mind, some of these people are responsible for maintaining infrastructures that could have life and death consequences were an outage to occur. People of this class are also responsible for the vast amount of infrastructure that keeps our society, financial institutions and even our national defense operational. Is it any wonder that they value stability over automatability? If I had that kind of responsibility, I'd probably be clinging to the FC Tree too, because in my (and more importantly their) opinion, physical FC is a better transport for storage than Ethernet.
Keep in mind I’ve spent the majority of the previous 7 years working with FCoE (wrote two books on the topic) and have spent the past 3 years focused on IP SANs and Virtual Networks. So I could probably ramble on endlessly about why I feel this way, but I’ll boil it down to the FC advantages that seem to resonate most with the FC Tree Huggers (my “peeps”).
Top 8 reasons to hug the FC Tree:
This is the most frequently used justification (by far) when we speak to customers who don’t want to converge their LAN and SAN. While there are a lot of different dynamics at play behind this seemingly straight-forward sounding reason, they boil down to a general distrust between the Network and Storage guys. I know this reason is an old-saw; we’ve been hearing about this concern since before FCoE was even released. And, I think many blew this concern off as something that the management layer could somehow organizationally resolve (at least eventually). And, although I am aware of instances where management has been able to get the Network and Storage guys to cooperate, the majority of these teams don’t, and this is the root of many of the other issues related to reason of isolation.
A great example where the lack of trust between Network and Storage teams poses a tangible problem is in the area of QOS guarantees and performance monitoring. For the sake of this example, let’s assume that the network team has agreed to dedicate bandwidth (call it 40%) to storage traffic across a typical three-layer network topology or let’s be generous and assume it’s a leaf /spine topology. When users start complaining about bad performance and the Compute guys start pointing at the storage guys as the root cause, how are the storage guys going to be able to troubleshoot the SAN portion of the converged network? According to a very reliable source, it’s technically possible to configure RBAC to give the storage team access to only their ports and even to give them read only access, but what about visibility to the shared links (e.g., Leaf to Spine)? How are they going to clear counters, or drop a suspected problem port in the course of troubleshooting? Again, in the vast majority of cases, there is no way the storage guys would even have visibility to the network.
Another interesting concern about converged networks is the VLAN hopping issue we discovered in the lab.
None of the above is a problem if the Storage team has dedicated network resources (either FC or IP based would be fine).
If a port misbehaves either at the physical layer (FC-0, FC-1), the transport protocol layer (FC-2), or the storage protocol layer (FC-4), a FC switch can take the port offline automatically. Yes, an Ethernet switch can do this too (e.g., BPDU guard), but generally speaking not for a transport protocol layer or storage protocol layer violation.
So why does this matter? Well, it’s not terribly common, but one example that I've personally worked on a couple of times in an HBA that malfunctions in a way that causes a denial of service attack against the storage port they are logged into. When this happens, an FC switch may be able to detect the condition and shut the port down before it can impact the other devices that may happen to be using the same storage port.
Bridge collapses aren’t funny but the following picture of the I-5 bridge collapse in Washington State illustrates why many people who are very concerned about catastrophic failures prefer an air gap design.
In the case of a converged SAN, again the concern about the interaction between the Network and Storage teams comes into play. The storage teams are concerned that their paths won’t actually be redundant and that upgrades will not be staged (one leg at a time).
As I mentioned in my original air-gap blog post on the topic of air gaps, there is a technical solution to this problem but again it requires coordination between the Network and Storage teams.
Slow drain devices and congestion spreading are facts of life for lossless networks like FC or DCB Ethernet (used to transport FCoE, Lossless iSCSI and RoCE). For an in depth description of the problem, refer to the EMC Networked Storage concepts and protocols techbook and look at the Congestion and Backpressure section. The good news is with FC, both switch vendors have been working on solutions to these problems for years; not so with Ethernet. Brocade especially has kicked things up a notch with FOS 7.4 and their slow drain device quarantining feature.
OK, not that kind of centralized name service... But while were on it, George, please finish writing books 6 and 7 and PLEASE stop killing all of my favorite characters!
In addition to enabling the “self-documenting” and “network centric management model” discussed below, the distributed FC Name Service provides a very simple way for users to select end devices and add them to a software defined network (more commonly referred to as a FC zone). Zoning is both a blessing and a curse to FC. It’s a blessing because it gives the users the isolation they need, but it’s a curse because it takes a bit of skill to properly administer them and this administration is difficult to automate. You can do something like this with Virtual Networks, which is exactly why we’re digging into them.
When I first pitched a detailed overview of TDZ at EMC World 2011, many of the concerns I received were in regards to a perception that the users would lose the ability to define the zone names. Actually this wasn’t just a perception, that was pretty much my intention from the get-go. In any case, as I talked about this concern with end users, many of them explained that their zone names all follow a specific name format (unique to their environment) that includes bits of information such as, the application, the hostname and the storage port interface; and that this information was of critical importance during troubleshooting. The idea is that an end user calls the storage admin with a concern about a particular application or host and because of the zone naming convention, it’s very easy for the storage admin to use the zone name to locate the relevant ports and their WWPNs. Once the WWPNs have been identified, the storage admin can verify that the appropriate ports are logged into the fabric and then drill into the specific physical interfaces involved to look for errors. The same cannot be done with IP Storage.
I wrote a detailed blog post on this topic, so I won’t repeat all of those details here. However, the key takeaways are:
FC and FCoE are Network-centric. Both protocols rely on fact that the network will control what each end device has access to. A Network-centric approach is probably better suited for large organizations that need centralized control of access to storage resources.
iSCSI is End-Node-centric. iSCSI relies on the fact that the network will allow communication between the iSCSI Initiator and whatever iSCSI Target the Server Admin points the Initiator at. Since control is managed at each individual end point, the end devices have evolved so that they will only discover what they are told to discover. The bottom line is iSCSI is probably better suited to smaller organizations that do not need centralized control of access to storage resources.
Forward Error Correction is required on 16G FC links but not on 10GbE. So why does this matter? Well, SCSI-FCP (SCSI - Fibre Channel Protocol) was designed to run over a practically lossless transport and as a result, there’s no retransmission capability built into FC. This is usually fine because historically frame loss due to drop or corruption is very rare. The problem is, as link speeds increase and as cabling infrastructures age, we’re noticing that bit errors due to either dirty fiber terminations or exceeding maximum distance supported for a given fiber type at a given speed are starting to increase.
With FC, when a bit error occurs, if it happens to land in the middle of a frame, the frame will be discarded by the receiving interface. This can have a tremendous impact on the SCSI protocol, sometimes resulting in the need for the SCSI timeout to expire (30-60 seconds by default) before the IO can be retried.
FEC helps prevent this by correcting bit errors and this prevents the frame from being discarded by the receiving interface. This means that the SCSI timeout scenario is much less likely to come into play when you’re using 16G FC.
Since FEC is not supported on 10/40GbE, you are more exposed to these kinds of problems when using FCoE at 10GbE. iSCSI / NAS both use TCP, so this problem wouldn’t be a much of an issue in an IP SAN.
It’s worth mentioning that according to a friend who’s familiar with IEEE and the standardization of 25GbE and beyond, it appears as if 25GbE SR will be supporting FEC, so this won’t be an issue for Ethernet once you move to 25GbE.
So which is better, FC or Ethernet? It depends on what you need…
If you’re working in a traditional enterprise data center and you need stability more than automatability, then embrace your inner FC tree hugger and continue to support the connectivity requirements of your existing platform 2 Applications using the best transport available, FC. FC and the companies that provide FC solutions are not going away any time soon and based on the amount of innovation we've seen recently from our switch partners, you can bet they’re not going to stop innovating anytime soon either. As a result, expect to see 32G/128G FC shortly and new protocols being added such as FC-NVME as it makes sense to do so.
If you’re working in an environment where automation is king and the use of commodity homologous HW is a foregone conclusion, then you’re probably already using either a Server SAN approach or an IP SAN of some kind. However, if you find that you need to use NVME at some point in the future, I would urge you to at least consider FC and see what it can offer you once the protocol has made it through the standardization process.
Thanks for reading!
It's almost time for EMC World (May 4th - 7th) and I’m happy to say that once again I’ve been honored with a couple of highly coveted EMC World speaking slots! Based on the very positive feedback I received from last year, I’ll be sticking to a similar format and focusing on:
Additional details about each session are provided below.
Title: SAN & IP Storage Networking Technologies & Best Practice Update
Dates and times: Monday, May 4, 4:30 PM - 5:30 PM and Wednesday, May 6, 8:30 AM - 9:30 AM
What’s new with storage connectivity?
Storage connectivity (Protocol update)
I'll also be moderating the following Birds of a Feather (BoF) session.
Title: " FC SAN: Should I stay or should I go…"
Joining me during the BoF will be:
Date and time: Wednesday, May 6, 1:30 PM - 2:30 PM
Agenda: This is an interactive session where we ask our Customers for feedback on particular topics. This year's BoF session abstract is:
Refreshing your infrastructure? Wondering “should I converge my network now and go with an IP based Storage Area Network or should I wait
just a little longer to see what happens in the industry”? Have you started the process of migrating to an IP based SAN and uncovered some of the IP SAN pitfalls? Wondering why you can’t connect your vmknics to your NSX Logical Switches?? Thinking about how to support the connectivity needs of ScaleIO, VMware Virtual SAN or XtremIO? If you answered yes to any of these questions, join us for a candid discussion that will be focused on storage networks and the intersections with Ethernet, IP and Network Virtualization. We’ll provide some insight into the trends we’re seeing and you can feel free to share your thoughts and concerns or just take the opportunity to vent.
Hope to see you there!
I'm still working on the fourth installment of the Virtual Storage Network (VSN) blog post series, but I wanted to share this link to a youtube video that shows a very simple (2 port) VSN demo that we'll be showing next week at EMC World. The topology that I'm using in the video is shown below.
The demo shows a File client in "VM1 Tenant 2" being connected to a FIle Share on the VNX via a VXLAN tunnel. The VSN-CS (which will be the subject of yet another blog post) was used to create the connection between "Tenant Y VSN" and "VLAN Y".
Thanks for reading and I hope to see you at EMC World next week!