Thursday, 31 October 2013

Introduction to FCoE



  • SCSI protocol: carried over a network transport via serial implementation. Two primary transports today, FC and IP.
  • Fibre Channel provides high-speed transport for SCSI payload via HBA. FC overcomes many shortcomings of parallel I/O and combines best attributes of a channel and a network together.
  • Storage Protocol Technologies:
    • FCP
    • iSCSI
  •  FC has many similarities to IP (TCP). FC is hop-by-hop flow controlled. No end-to-end flow control at FC level only at SCSI level. To maintain no drop packets. SCSI has timeout of 60s. You can imagine if you drop one packet, scsi operation gets corrupted and then transmits, no rapid retransmission.
  • can run a lot of parallel connections.
  • E_port: expansion port, ISL
  • TE_port: 802.1q, ability to run multiple VSAN, trunking for VSANs
  • N_Port: Node Port, server, HBA etc…they connect to F_port on the switch.
  • NP_Port: It goes in an NPV mode switch, a switch that emulates a host or proxy. Emulates an N port, reduces a lot of management.
  • WWN: burnt-in unique addresses assigned to fabric switches, ports, and nodes by manufacturer. That’s where the similarity ends with comparison to MAC addresses. FC packets, WWN is not there, only used in a few frames to uniquely identify the sender of that packet. Otherwise you’d see in the src/dst is a dynamic address (FC_ID). They’re unique and registered with IEEE.
  • FC uses something similar to DHCP. It’s called FCID. Divided into “switch domain” (8bits), “Area (8bits)” and “Device” (8bits). Makes routing decision easy with it. Switch Topology Model. Switches assign FC_ID addresses to N_Ports.
  • 32K Exchange frames,8K chunks is sequence and each of those 8K chunks is made of 2K frames. FC-2 Hierarchy. Makes it easy to fire multiple IO because each one has unique OX_ID (exchange ID) so we can load balance them on ISL
  • Cisco is the only vendor that supports FC portchannels. Trunking capability, really allows Cisco differentiation.
  • VSANs. Same reason we have VLANs, we have VSANs. Shared services that are running in FC environment, in order to reduce, we use VSAN.
  • Used for storage tiering. 5K, 7K and UCS have this as well as MDS.
  • Initiator has HBA. East-West seperation.
  • Fabric Shortest Path First: Just like OSPF. FSPF routes traffic based on destination domain ID.
  • Storage Security, major activity done daily. Zones are bi-dir ACL. Use WWN for ACL, so use those for the ACL. Fabric in hardware enforces who the initiator communicates with. Zone members can only see and talk to other members of the zone. Zones belong to a zoneset. Zoneset must be “active” to enforce zoning. Only one active zoneset per fabric or per VSAN.
  • When you first physically join the fabric and negotiate speed, the initiator will do a FLOGI, it’ll start sending packets to the switch (not target). Tell it, I’m an initiator and I need to register to Name Server and I need to tell it what my WWN is, it’s going to grant me a FCID. Now I have an address that I can use to send frames out there. Src/Dst is FCID and NOT WWN.
  • Talk to FC switch and figure out what devices I can communicate to, and FC db will determine from zone devices it can talk to. Then it does a P_LOGI which will do end to end communication. PLOGI is done end to end. Target would do the same steps as initiator at the same time.
  • What is NPIV? Before we do NPV, we need to understand NPIV. N-Port ID Virtualization. Allows to allocate multiple FCIDs to a single port. Feature on core director. If we have a VMware server, we can assign a FCID to each different VMs, which allows tracking them differently for each fabric.
  • Then what is NPV? Think of a 5K with 2K and lots of end devices. NPV mode allows to turn that switch into an initiator or into a host. So don’t have to run shared services, no ISL b/w MDS or NPIV switch. It logs as an N_Port rather than as an E_Port. It’s proxying all of the real-servers that are plugged into it. In TOR design, you have hundreds of UCS and 5K, going into my MDS…you can really reduce that by using NPV mode in TOR switches. Use with NPIV core directors, could be an MDS or 7K.
Advancements in Ethernet:
  • adoption for 10G is a major driver. Ramping to 40GE. Puts a nail in the coffin of native FC speeds. Better than 8G or 4G FC since every DC now has 10G. Once 10G to the server happens, it’ll put a nail in the coffin for FC, since FC requires a PCI card which is power hungry device.
  • Standards for FCoE: FC is made up of T11 (FC-BB-5, FC on other network media) standard and IEEE 802.1 DCB. DCB includes, PFC [Lossless ethernet-802.1Qbb] + ETS[prority grouping, 802.1Qaz] + DCBX[configuration verification, 802.1Qaz]
  • PFC:Priority Flow Control (802.1Qbb), available on 5k, 2k, 7k, MDS. Able to pause FCoE traffic. Ability to accept pause frames.
  • ETS: Enhanced Transmission selection: allows ability to create groups of protocol and bandwidth to protocol. I want to reserve 80% on the wire for FCoE traffic and rest for Ethernet. Down at L2.
  • DCBX: 802.1Qaz, going to go through the DCBX process, that they support PFC, ETC and FCoE before they send out FCoE packet.
  • It is a standard. They are all technically stable. term used by standing committee that it has passed a milestone of standards and vendors can start making products. So FCoE is a standard now.
  • You can use twinax cable for FCoE. SFP+ CX-1 Copper (SFF 8431). Drives down the power and cost significantly. <10m. Only 0.1W per port. Cable and SFP are physically one component.
  • CNA: HBAs that enable both FCoE and LAN traffic out of the same port. Single chip. FCoE in software can also be done with the software driver. You can run FCoE on intel or broadcom chip.
FCoE Technology/Unified Fabric:
  • completely based on the FC model. WWNs, FC-IDs, Zoning, Nameserver, RSCN. Compare this with iSCSI, completely different model than FC. Very different management and tools.
  • yet another overlay network.
  • Products, 2k, 5k, N7K (32 port F-series), MDS 9500 (8-port FCoE card)
  • FCoE is two different protocols: FCoE itself and FIP (FCoE initialization Protocol)–> control plane protocol.
  • FIP is fairly shortlived protocol. It does VLAN Discovery, FCF discovery (fibre channel forwarder…fc switch inside of a ethernet switch), FLOGI/FDISC..need to login and get FCID and will be using that inside my FC packets. FIP will complete and will hand it off to FC.
  • 2180 byte frame (baby jumbo frame in ethernet environment).

IP Services: Syslog, WCCP, ICMP

SYSLOG:
  • uses UDP port 514
  • use, logging <host> command and optionally, logging trap.
  • default facility of local7
  • e.g: “service timestamps log datetime localtime” à logging 192.168.1.100 à logging monitor informational.

 WCCP:
· uses UDP port 2048
· upto 32 content engines can communicate with a single router using WCCPv1.
· The content engine with the LOWEST IP address is elected as the lead engine.
· WCCPv1, only ONE router can redirect traffic to a content engine or cluster of content engines. ONLY supports HTTP traffic (TCP port 80)
· WCCPv2, multiple routers and multiple content engines can be configured.
§ Supports TCP and UPD traffic other than port 80, including, FTP caching, FTP proxy, web caching for non 80 ports, real audio, video and telephony.
§ supports multicast.
§ provides MD5 security “ip wccp password password”
§ load distribution
§ transparent error handling.
§ default version is WCCPv2.
· Configuring
§ globally: ip wccp web-cache group-address <ip> password Cisco
§ redirecting traffic out: ip wccp web-cache redirect out ß to content engine
§ inbound traffic on interface is excluded from redirection: ip wccp redirect exclude in
ICMP:
· Echo Request: sent by a ping from the host to test node reachability
· Echo Reply: Indicates the node can be reached successfully.
· Redirect: sent by the router to the source host to stimulate more efficient routing
· Time Exceeded: sent by the router if an IP packet’s TTL filed reaches zero.

IP Services: SNMP, NTP

NTP:
  • NTP server: (global) ntp master 7 ß stratum 7
  • NTP symmetric active mode: router/switch mutually synchronizes with another NTP host, configured with ntp peer command. (global) ntp peer 10.1.1.1
  • NTP broadcast client: Listens to NTP broadcasts on the Ethernet. (int) ntp broadcast client
  • NTP client: configures, “ntp server 10.1.1.1
  • Authentication on NTP:
    • ntp authentication-key 1 md5 <name>
    • ntp authenticate
    • ntp trusted-key 1
  • under interface configure “ntp broadcast” (broadcast the time)
  • show ntp associations

 SNMP
  • SNMPv1: simple authentication with communities, used MIB-I
  • SNMPv2: removed requirement for communities, added GetBulk and inform messages, MIB-II
  • SNMPv2c: only difference, allowed SNMPv1 style communities with SNMPv2
  • SNMPv3: better security, backward compatibility to communities.
  • communities: read-only, read-write, trap.
  • Inform requests are acknowledged with an SNMP response packet.
  • Messages:
    • Response: responds to information in Get and Set requests.
    • Inform: A message used b/w SNMP managers to allow MIB data to be exchanged about agents they both manage.
  • MIBS:
    • RMON is outside MIB-II
  • SNMPv3 adds authentication and encryption. MD5 and SHA creates a message digest for each protocol message (authentication) and DES to encrypt messages providing encryption (privacy).
  • SNMP embedded event manager
    • automatic recovery actions are performed without need to fully reboot the routing device
    • allows event management capability directly inside the Cisco IOS devices.
    • action snmp-trap enables the traps event-manager command, also requires snmp-server configuration.
    • two types of EEM policy: applets and script
    • E.g: event manager applet IOSWD_Sample1
      • event ioswdsysmon sub1 cpu-proc taskname “task 1” op ge val 25 period 10 (triggers an applet when avg cpu usage is greater than or equat to 25% for 10 seconds. )
      • action 1.0 syslog msg “IOSWD_Sample1 Policy Triggered” (generates syslog notification)

CCIE: Routing To Next-hop vs Routing To Interface

Concept learnt from IE’s Vol5.0 workbook for “IP Routing”
 

When routing to a next-hop value the router performs L2 to L3 resolution on the next-hop address. (e.g. ip route 150.1.4.4 255.255.255.255 155.1.146.4). So in the arp table, you’ll see the MAC for ip address: 155.1.146.4.
  • When routing to an INTERFACE, the router performs L2 to L3 resolution on the FINAL destination (not on the next hop). (e.g. ip route 150.1.6.6 255.255.255.255 fa0/0 configured on Router1). Let’s assume 150.1.6.6 is a Loopback interface on Router6 and Router 6 is connected to the LAN via Fa0/6. When we configure the ip route mentioned above on R1, on R1′s ARP table, you’ll see the MAC address of Fa0/6 interface for the loopback of R6 (i.e. 150.1.6.6). This is because, PROXY ARP is enabled by default on the routers. If we were to disable proxy arp on Fa0/6, you’d notice that you won’t be able to ping the loopback of R6 anymore, since the router does not know the correct l2 address to use when building the L2 frame. You’ll see “encapsulation failed” message in the debugs:
*Mar  5 02:18:49.733: IP ARP: creating incomplete entry for IP address: 150.1.6.6 interface FastEthernet0/0
*Mar  5 02:18:49.733: IP ARP: sent req src 155.1.146.1 000f.f756.6560,
                 dst 150.1.6.6 0000.0000.0000 FastEthernet0/0
*Mar  5 02:18:49.733: IP: s=155.1.146.1 (local), d=150.1.6.6 (FastEthernet0/0), len 100, encapsulation failed.

  • Resolution: 1) change the ip routing so it uses next hop rather than ARPing on Final destination. 2)statically configure the MAC address to use when sending packet to the loopback of R6 by using: router(config)”arp 150.1.6.6 <mac> arpa command.

An Introduction To IP Multicast

All hosts that are connected to a LAN must use a standard method to calculate a L2 multicast address from the L3 multicast address and assign it to their NICs.

IGMP provides communication b/w hosts and a router connected to the same subnet. CGMP = IGMP snooping helps switches learn which hosts have requested to receive the traffic for a specific multicast application. (switches learn which ports would like to receive Mcast traffic using CGMP)

Some Multicast routing protocols (allows routers to forward multicast traffic from MCast servers to hosts. Distance Vector Multicast Routing Protocol (DVMRP), Multicast OSPF (MOSPF), and PIM-DM and PIM-SM.




Multicast is UDP-based (unreliable). Some multicast protocol mechanisms occasionally generate duplicate packets and deliver packets out of order.
   
The first 4 bits of the first octet for a class D address are always 1110.
    Range: 224.0.0.0 to 239.255.255.255 ( no need for masks), only one requirement, first 4 bits have to be 1110.
    Permanent multicast groups: 224.0.0.0 – 224.0.1.255
        for non-routing purposes: 224.0.0.0 224.0.0.255 (e.g. 224.0.0.1 [all multicast capable hosts on a local network] and 224.0.0.3 [all multicast-capable routers on local network]). 224.0.0.4 (DVMRP routers)
        for when packets need to be routed: 224.0.1.39 (RP announce) – 224.0.1.40 (RP discovery) (used by Auto-RP).
    Used with Source-Specific Multicast (SSM), 232.0.0.0 – 232.255.255.255
        purpose of these applications, to allow a host to select a source for the multicast group. Helps make Mcast routing efficient, allows a host to select a better-quality source and helps network admins minimize DoS attacks. ONLY IGMPv3 capable hosts can use this feature.
    GLOP: 233.0.0.0 – 233.255.255.255
        can be used by anyone who owns a registered ASN to create 256 global multicast addresses. Uses the value 233 in first octet and the ASN in the second and third octet. E.g: ASN 5663 would convert to: 0001011000011111. First eight bits equal to 22 and last 8 bits equal to 31, will become, 233.22.31.0 to 233.22.31.255
    Private: 239.0.0.0 – 239.255.255.255
    Multicast addresses for “transient” group: remaining multicast addresses are transient groups. Enterprise is expected to release this after use.
    Mapping IP Multicast addresses to MAC addresses:
        e.g 228.10.24.5, replace the first 4 bits 1110 à 01-00-5E (first 6 hex of 12 hex)
        replace next 5 bits of binary IP with 0 ALWAYS
        01-00-5E-0 (becomes now)
        the last 23 bits of binary IP in the last 23 bit space of the multicast MAC address.
        A-18-05
        0×01-00-5E-0A-18-05
        possibility of duplicate addresses is there!!
    Three different tools, namely CGMP, IGMP snooping and RGMP allow switches to optimize their multicast forwarding logic by answering the question of which hosts to forward traffic to in a broadcast domain.
    IGMP:
        IGMP messages are sent in IP datagrams with IP protocol number2, IP TTL set to 1.
        IGMP packets pass only over a LAN and not forwarded by routers due to TTL.
        2 Goals: to inform mcast router that a host wants to receive packets from a specific group and to inform local multicast routers that a host wants to leave a mcast group.
        IGMP, b/w hosts and router.
        IGMP v2 packet:
            Type (8 bit) has four message types: Membership query, version 1 membership report (for backward compatibility), Version 2 Membership report, Leave Group.
            Max response time: default 100 (10 seconds) default. Allows for tuning response time for the Host Membership Report.
            checksum
            Group Address: set to 0.0.0.0 in general query and to group address in Group specific query.
        REASONS for v2: better “Leave” mechanism to shorten the leave latency. Group-specific query messages permit router to send a query for a specific group instead of all groups. Provides MRT field. Querier election process: provides the method for selecting the preferred router for sending Query messages when multiple routers are connected to the same subnet.
            IGMP v2 router sends IGMPv2 quey message every 125 seconds.
        Multicast hosts must listen to the well-known 224.0.0.1 multicast group address to participate in IGMP and to receive mcast queries.
        by setting the group address to be 0.0.0.0 the router is asking, “does anyone want to receive multicast traffic for any group?” Host responds with the IGMP report messages to inform Router.
        Host sends, “solicited host membership report” and “unsolicited host membership report”
        Multicast router only needs 1 report to forward traffic out its interface whether there are 1 or 200 users.
        IGMPv2 uses, MRT timer to suppress many of the unnecessary IGMP reports. Timer is called “query response interval”. Report suppression is when a host receives a report sent by another host for the same mcast group for which it is planning to send a report, host does not send. 3 second MRT is expressed as 30. Hosts pick the MRT randomly b/w 0 and MRT timer.
        IGMPv1 router takes 3 minutes to conclude that the last host on the subnet has left the group as opposed to IGMPv2 router, it takes only 3 seconds.!
        IGMPv2 leave group and IGMPv2 Group-Specific query message work together.
        Last Member Query Interval by default is the MRT which is 10 (1 second). The router sets the Last Member Query Count to 2. So the leave latency is less than 3 second usually.
        IGMPv2 querier: when multiple routers are connected to a subnet. The router with the LOWEST IP address on the subnet is elected as the IGMP querier. “OTHER Querier Present Interval”. Default value is 255 seconds, because the default general IGMPv2 query interval is 125 seconds and default query response interval is 10 seconds.
        IGMPv2 Host and IGMPv1 Routers: IGMP v2 hosts determines whether the router is v1 or v2 by the MRT fields of the periodic general IGMP query. IGMPv1 queries, this field is ZERO. IGMPv2 Host “version 1 router present timeout” timer is 400seconds.
        IGMPv1 Host and IGMPv2 routers: determines by IGMPv1 report and figures it out. With one or more IGMPv1 hosts listening for a particular group, the router essentially suspends the optimizations that reduce leave latency. IGMPv1-host-present countdown timer = 180 in IGMPv1 and 260 seconds IGMPv2. (based on Group membership interval).
        IGMPv3: allows a host to filter incoming traffic based on the source IP addresses from which it is willing to receive packets, through a feature called “Source-Specific Multicast” (SSM). It allows a host to indicate interest in receiving packets only from specific source addresses or from all but specific source addresses, sent to a particular multicast address.
        destination address is 224.0.0.22 for IGMPv3 report. Message type is 0×22.
        How does a host learn group source addresses? Cisco has designed URL Rendezvous Directory (URD) and IGMPv3 Lite to use the new features of IGMPv3 is fully available.
    LAN Multicast Optimizations
        CGMP: L2 protocol, permits router to communicate L2 information it has learned from IGMP to switches.
        only routers generate CGMP messages, switches listen. CGMP needs to be enabled on both ends of the router-switch connection over which CGMP is operating.
        Destination Address on the CGMP messages is always well known MAC 0×0100.0cdd.dddd.
        Important info in CGMP messages is: Group Destination Address (GDA) and Unicast Source Address (USA).
        router sends a CGMP join message (every 60s) with GDA=0, and USA=it’s own mac.
        when router receives a join request from a host, it sets the DA=well known mac, USA=host’s MAC, and GDA=group Mac. “A host with USA MAC of xx has requested multicast traffic for the GDA…., so map your CAM tables accordingly”
        Leave: R1 sends GDA=group, and USA=0, to say that no host is interested.
        “clear ip cgmp” command is entered at the router for clearing all CGMP entries on the switches, the router sends the “delete all groups”, CGMP leave message with gda and usa set to 0. When switches receive these messages, they delete all group entries from CAM tables.
    RGMP: is a l2 protocol that enables a router to communicate to a switch which multicast group traffic the router does and does not want to receive from the switch. Router can reduce its overhead this way.