December 17, 2020|Design, ROBO
Here are the things that I am going to cover in this design series. As each blog is written, I will update the links below to the other pages. The idea here is to create a live design book that can be referenced and updated for different types of use cases. This first one is ROBO.
- Use cases
- Node types
- Tactical Data Centers
- Single and dual node
- Re-use existing hardware
- Re-use existing storage
- Competitive outlook
- External factors, (radiation, heat, water, etc.)
- Designing for each use case
When you are looking at sizing an environment there are several factors that need to be addressed and questions that should be considered:
1) What are the workloads that will run on the environment?
2) What does “fit for use” look like in terms of metrics (availability, manageability, performance, recoverability, security)?
3) What are the priorities if trade-offs need to be considered (ie; fast, good, cheap, etc.)?
4) Create a requirement rubric that can have weight against a solution.
5) Identify requirements that are “must-have”, vs “nice-to-have”.
6) If this is an existing workload, get historical consumption and performance data for the workload and its existing infrastructure.
7) If this is a new workload, model based on recommended practices for that type of workload
8) Determine the constraints for a solution (power, space, cooling, physical access, etc.).
9) Determine the hardware vendors being considered and compare their portfolios of viable solutions.
10) Determine the budget for this initiative and evaluate against solutions with a TCO (Total Cost of Ownership).
This is not a comprehensive list, but serves as a starting ground for discussions. All too often people will just look at source hardware specs and utilization, then try to replicate that in a final solution, with a little space for growth. Although this may work for a quick and dirty estimate of what a solution may be, it might miss the finer requirement points and provide a functional but non-optimal solution.
With ROBO environments this can make a big difference because often they are replicated based on a certain configuration, or template. If that design is non-optimal, then the deficiency is multiplied by the number of sites.
Some key points to address:
- What does the network look like and what is the available port type, count and bandwidth?
- What do the traffic patterns look like? North-south? East-west? Traffic types? Network load?
- How is the network logically segmented?
- What method is used to secure the environment? Network zone ACLs? Layer 3/7?
- Is there value in microsegmentation for the workload?
- Is the workload mobile, or does it need to have the ability to migrate automatically or manually?
- Does the workload need to have a backup, BC/DR plan?
- Is there currently a BC/DR plan in place? Does it meet the application owner requirements?
- Can the BC/DR plan be improved? Would it cost more, or less? Can management complexity be reduced?
- Is there application level data replication? Are there any infrastructure native alternatives that may be more beneficial for cost, performance, or manageability?
- How resilient does the platform need to be? How many simultaneous, and serial failures for components does it need to protect against?
- Does the platform need to be centrally managed?
- Would there be benefit for deployment automation?
With ROBO environments the most common constraints are cost, manageability, and lifecycle management. An organization may have a set budget that they need to design against, which will limit options right out of the gate. This is where we can start looking at the trade-offs of designing against that budget and see if it is able to provide an optimal fit-for-use solution. If budget was increased, would other options that are more in line with the requirements be available?
To size based against the most common constraints, lets dig into them a bit:
There are hardware costs, software costs, operational costs, and provisioning costs.
- Hardware will come from your vendor of choice. Its all x86 anyway, so the differences often come down to non-technical aspects like client relationships, procurement vehicles, capex costs for comparable hardware, and vendor support.
- Software costs cover the workload applications, hypervisor, guest OS, and infrastructure components.
- Operational costs can be calculated by the number of individuals supporting the solution, multiplied by the number of hours used for management, multiplied by the staff resource cost per hour.
- Provisioning costs are the operational costs required to stand up a new environment. Sometimes the operational staff resources will be different than the provisioning staff resources, so cost per hour may change.
Some management solutions can manage multiple sites with no additional cost, while others will charge you for that capability. In a ROBO environment, it is good to understand this. Also where the management plane resides is important. A centralized management plane may be an issue if there is no site access. Local management capabilities may also be required.
This includes updates, patches, expanding the environment, and removing end-of-life hardware. The expense for this is usually attributed to operational expense if it is a manual process. Also there may be limited time available for this because of staff being over-worked. This results in fewer updates, lower performance and higher risks of hitting software vulnerabilities or bugs. If it is automated, is there a cost for the platform that does this?
Some Nutanix Solutions that work very well in ROBO environments.
Centralized management for multiple clusters / sites.
License tier on Prism Central that allows for playbook automation for operations, capacity planning, reporting analytics and VM right sizing. This will tell you when you are running out of resources, any issues on any sites and create automated actions for remediation. Essentially a self-driving datacenter model for all your sites.
This allows for microsegmentation, environment isolation on layer 2, layer3 firewalling, ID based security policies with Active Directory integration, overlay networking, IPSEC VPN, NAT capability and VPC constructs.
This allows scale-out file sharing for SMB and NFS. Gets rid of CIFS silos, Windows file shares that need constant patching and downtime, the associated security risks, and single point of failure. All Nutanix software tiers come with 1TB free files licenses.
Perform async and near-sync replication to another site, with RPOs as low as 20 seconds.
Nutanix On-Prem Leap
Perform orchestrated DR with the ability to perform start-up order prioritization, and IP re-addressing.
This provides the ability to deploy and configure sites via automation from a central location, without the need for an onsite resource (except for racking and cabling).
Here are some resources to help with sizing Nutanix for ROBO environments:
Nutanix Design Guide (See chapter 16, by Greg White)