Nutanix ROBO, Edge, and Tactical – Design – Part 7- TDCs

January 11, 2021|Design, ROBO

Here are the things that I am going to cover in this design series. As each blog is written, I will update the links below to the other pages. The idea here is to create a live design book that can be referenced and updated for different types of use cases. This first one is ROBO.

  • Use cases
  • Node types
  • Licensing
  • Sizing
  • Edge
  • Mobile / MEC
  • Tactical Data Centers
  • Management
  • Provisioning
  • Hypervisors
  • Single and dual node
  • Networking
  • Bandwidth
  • Replication
  • SD-WAN
  • NFV
  • VDI
  • Re-use existing hardware
  • Re-use existing storage
  • Backup
  • Cost
  • Competitive outlook
  • Risks
  • External factors, (radiation, heat, water, etc.)
  • Designing for each use case

Tactical Data Centres

Tactical data centers have unique characteristics that other ROBO and edge environments do not. They need to be deployed quickly, have a small footprint, be mobile, maintain data security if they are compromised physically, be resilient to physical damage and to the elements, and be battery backed. They also need to adhere to the strict security standards that are required by federal government and law enforcement agencies.

First let’s look at some use cases, then we’ll dig into the technology required.

1) Rapid Emergency Response / Disaster Relief

When natural disasters occur, such as hurricanes, forest fires and flooding, regular communication channels and systems are ineffective. Emergency personnel need reliable real-time voice and data communications systems. These are provided by base stations that connect to localized teams, and backhaul networks, providing the required services and connectivity to save lives and rebuild infrastructure.

2) Mobile and short-lived deployments

Mobile & Modular datacenters for short lived deployments need to be able to be stood up and started up very quickly, provide applications and services immediately, be resilient to component failure and environmental situations. These get moved around a lot, jostled and battered. They cannot afford a services outage just because a server has components or connections shaken loose.

3) Field office / Forward Operating Base (FOB)

When operations and missions require a longer term (but possible mobile) base, an FOB will be setup. These will have applications and services that are centrally managed, updated and deployed. There may also be data replication from FOBs to regional datacenters in Main Operating Bases (MOBs). The datacenter could be in a permanent structure with racks, but most likely a smaller than normal footprint.

4) In-theatre tactical environments

In-theatre tactical environments have a lot of similarities to Mobile and Short-lived deployments. However, they may also have GTFO (Get The **** Out) operational procedures. Those might include taking data with you, or erasing local systems. These systems may be in command posts or in vehicles. 

With command posts, mobility is normally not immediate and rather done in several phases. First, troops would start with tent infrastructure, then generators and UPS units. Then local network switching, server, and storage infrastructure would go up. Next would be satellite and other RF links, followed by the running of cables to provide the local area network in the command post.

There has also been a shift from relying on dedicated proprietary data communications systems to ruggedized COTS (commercial of the shelf) systems that adhere to the security specifications required. This has changed the landscape of communications methods for classified. The old method was:

In an FOB, use a locally wired LAN connected to another LAN via point to point RF, or secure satellite link. This meant that no WiFI, or cellular/LTE communications could be used for classified data.

In the new method, validated COTS hardware is used with layers of encryption in the hardware, network and at the application level. This increases the capability for situational awareness with sensor data, and mobile communications. It also increases network availability by leveraging alternative network paths and not have a single point of failure.

A good example of this is in the United States, where the NSA has established a program called Commercial Solutions for Classified.

The ultimate goal for in-theatre tactical environments is to be secure, quick, agile and with a high level of communication performance and availability. I’ll go deeper into tactical network architecture in another blog post when I discuss C4/C5/C6ISR. Here is a quick link that has a good overview. Basically, it is an acronym that encompasses the technological capabilities of a platoon, company, division, etc.

C2 is Command and Control

C3 is Command and Control, and Communications

C4 is Command and Control, Communications, and Computers

C5 is Command and Control, Communications, Computers, and Cyber Defense

C6 is Command and Control, Communications, Computers, Cyber Defense and Combat Systems

ISR is Intelligence, Surveillance, and Reconnaissance 

Most modernized militaries have some aspect of C4ISR / CS5ISR in theatre and can be achieved with a tactical datacenter. C6ISR is reserved for integrated Naval weapons systems like AEGIS.

Now lets look at what the technology looks like.

Nutanix has partnerships with a number of hardware vendors that operate in this space. Nutanix provides the software and the partner vendor provides the hardware. In the TDC space, a few partners stand out, KLAS, Crystal and HPE.

I would do a disservice if I only mentioned the hyperconverged infrastructure in the solutions, because to make it work there is always a networking component. There also may be other supporting systems that need integration and connectivity. So lets first look at what a logical diagram for the entire network topology may look like.

There are systems in place to get data from sensors. This data could be for surveillance, or for EMF radiation signatures, or a myriad of other things. The data can then be analyzed using an inference engine and provide immediate actionable information. That inference engine can operate within the TDC.

There are physical connections to each of the networks available to a site, but there are also management platforms and controllers for the connections. For instance, a platform used for managing the mesh networking and data synchronization of tactical radio endpoints could run on the TDC. Or cellular base-station software could run in a VM on the TDC, or a virtual routing platform to connect radio, cellular, satellite and point to point microwave links. 

In the diagram I show 3 x TDCs and a main operating base. Some TDCs can communicate directly via point to point links, while others need to route through other TDCs. This allows for the replication and distribution of data and uplinks that can provide alternative routes if some forms of communication are compromised. As vehicles or troops are mobile, they can use their coms devices to access any of the available networks as a client, then route to wherever as needed, automatically. 

It may seem complicated, but its really not, nor should it be. In the following diagram, I highlight the import part of this architecture, as it relates to the Nutanix TDC.

In these TDCs you have your HCI infrastructure and the local switching to connect them. The local networking may included firewalls, routers and other WAN connectivity and isolation devices. However, those are out of the scope of what I show with the following architectures. Some networking devices can actually run as software appliances within the TDC, further reducing 3rd party hardware requirements.

KLAS Voyager2 and the HPE DX8000 come with networking built into them, while the Crystal Servers do not. There are benefits and drawbacks to each approach. With Crystal, it is assumed that you will utilize your own pre-existing switching, so nothing is provided. With the other 2, the networking has a very small profile and does not hinder the setup, rack mounting, or overall space.

KLAS has the Voyager Tactical Data Center solution.

Overview of the KLAS Voyager TDC

 Voyager 8 case and chassis with built-in UPS and AC/DC charge options

  • 4x TDC Blades each with
    • Xeon D CPU with 128GB RAM
    • NVMe storage for caching
    • 4x 2.5″ SATA SSDs
    • VIK for easy configuration changes
  • Voyager TDC Switch with
    • 12x 10Gbits/s port with copper and fiber options
    • 121 Gbps backplane for line-speed processing on all ports simultaneously
    • 40 Gbps trunk for interconnection with 3rd party switches
    • Inter-VLAN routing in hardware at line rate
    • 1x 40 Gbps QSFP+ port for high-speed uplink. Can also operate as 4 x 10 Gbps SFP+ ports using included breakout cable
    • Port mirroring, IPFix
    • Ansible playbook management supported
    • Voyager Ignition Key (VIK) for configuration and storage
  • Validated with Nutanix AOS on VMware ESXi and Nutanix AHV
  • Can be removed from travel case and mounted in a regular 2, or 4 post rack

The KLAS Voyager system is very modular and the TDC is not the only solution they offer. You can mount routers, radios, and a whole host of other devices in the Voyager 8 series chassis. Also, the chassis depth is very short because of the orientation of the nodes and switching. This makes it light and very portable.

All hardware maintenance is done from the front of the chassis, including node/switch/cabling, etc. The chassis can also be removed from the protective hard-case to be mounted in a rack.

Crystal Group has a whole range of ruggedized servers and switches. 

The form factor used for Nutanix is the 2U rugged Carbon Fibre server.

The footprint is not going to be as small as the KLAS or HPE solution, but these things are indestructible. From the Crystal Group site:

“…withstands harsh environmental conditions, including shock and vibration, temperature extremes, sand/dust, sea spray/salt fog, and more.”

Stronger and lighter than their steel chassis counterparts, they minimize SWaP (Size, Weight and Power), while maximizing performance. A minimum rack RU height  for a scalable cluster would be 7RU with 3 x nodes and 1 x switch. However, if you wanted to minimize size, it would be possible to only use 3RU with a single node cluster and a single switch.

The switching used is a ruggedized version of a Brocade ICX switch.

If you wanted to have a completely self contained environment that can be maintained locally with no other hardware, like a laptop, then it may make sense to add a KVM to this. Crystal provides a 1RU 8-port KVM that fits the bill.

Space will be required for maintenance of cabling, because all cabling and power connectivity is in the rear. Drive removals are done in the front, which is normal. However, in regular operation these nodes would not need any maintenance done for 10 years on average. So the maintenance issue is not a real concern.

Crystal Group Solution overview

Crystal Edge Networking

HPE has the DX8000 

The DX8000 series is essentially the same as the EdgeLine 8000 (EL8000), but specifically designed to run Nutanix AOS.

Here is the solution brief.

There are some specific things that I like about the DX8000, which I think are unique. With the PCI riser cards in each node, you can add network adapters to connect to external switching, or add GPUs for add processing capabilities. If you want to keep things a bit more compact, you can add internal unmanaged switches to the chassis. This will then use internal traces from the nodes (you will need internal network adapters) to the switch cards (as shown above). The switches will then uplink to another switch for greater connectivity. I don’t think you can make a VPC with those uplinks, as they are pretty basic. So they would essentially be a single uplink from each internal switch.

Another interesting thing with the DX8000, is that the widths is pretty small, so you could have two of these units side by side and not use any more rack units. That would give you 8 x nodes in 5RU, which is pretty dense. For non rack environments, the DX8000 can be bolted into a vehicle, or moved around fairly easily.