IT is changing pace so fast these days and anyone who doesn’t move with the changes that it brings, gets left behind. We moved from the mainframe era to the virtualisation era and now into the cloud era. One thing that hasn’t changed from the mainframe era to the current one is the pace of new technology adoption. Be it All Flash Arrays, Hybrid Arrays or Hyper-Converged Infrastructure. (more…)
Its been a while since my last post. Truth be told, I have been busy with work mostly during the last few weeks. But my little one has been having some medical issues owing to which I havent been able to socialise as much or spend time on blogging even though I have a backlog of articles in my drafts 🙂
Now lets look at this. We all know of companies who start with a POC for a product or a technology and then as mysteriously as it can be, it will turn into production at the snap of a finger. It is never OK for a POC to turn into production. As long as there is an architect who is worth thier salt, they wont let it happen. Now lets look at reasons why a cloud POC or specifically a Hybrid Cloud POC should never be ‘productionised’
- Transformation – Regardless of which cloud technology you use, VMware/OpenStack/MS/Cisco, the criteria for a cloud POC is to show the orchestration and automation aspect. It is not catered for an end to end utilisation and transformation of the way a business looks at IT at a fundamental level. If there is a cloud vendor who tells you that moving to a cloud model isnt transformational, they are either delusional or not worth engaging. POCs dont handle transformation well.
- Integration – There is no extensibility (thanks @huberw and @timgleed for this word) or integration aspect that a company or a vendor would look at for a POC. If you are not going to integrate the cloud transformation into every single aspect of your company’s IT systems, you are not doing cloud correctly. And no vendor would agree to have integration or application extensibiity in a POC, purely because ‘how long is a piece of string’.
- Security- This is never done right in a POC. If you think you’re going to carry over the same aspect into production you are dreaming. If a vendor tells you they are serious about Security, dont take thier word for it. Security is something that all vendors (new and old) need to customise for each client. Security doesnt just mean network security or perimeter security or IPS etc. It also means user security and data security, most of the incidents around security or lack thereof happen internally more.
- XaaS strategy – Anything as a Service is a rabbit hole that noone ever gets out of. The simple reason being you always think of something else that can be automated or orchestrated. This is not necessarily a bad thing, except when you do this in a POC. Limit your exposure, do a lot of easy things or a few harder things, but never attempt to do everything.
- Limited resources- When doing a POC you have limited resources, well even if you dont see the point above. You are doing a POC because you want to prove something works the way you think or a vendor says it does. So there are going to be shortcuts taken, there will be things which arent perfect. Well its a POC remember its not meant to be perfect. Its not just hardware resources I am talking about, more important than the hardware resources, its the “human” tech resources or lack of it that place a lot of strain on POC.
- Project Management / Program Management – The journey to cloud should be treated as a major program, with multiple streams, specific goals and with limited time and budget. POC can be done without having any project management initiatives. Most of the prod environments with issues that i’ve seen are the ones where it was a POC and it was productionised. Guess what the POC wasnt run as a project or a program, because lets be honest, smaller POCs dont require Project Management. Cloud POC does.
- Disaster Recovery – So if you productionise your POC, where is your DR plan? Yep, cloud enviroments need DR / BC plans more stringently than non cloud ones. “Cloud doesnt have an outage, well unless you are Office 365 / Azure 😉 “.
- Limitless Scaling – No, unless all the business / application/ technical / user requirements are captured for a POC and you’ve built a pod architecture that scales limitlessly based on those requirements, scaling up or scaling out a Cloud POC for production is stupid. There I said it .
- SDN/SDS Strategy – If you dont have a SDN or a SDS strategy for your cloud environment, then stop reading this blog and go do your homework on Cloud 101. When you’re ready to think about SDN/SDS make sure you identify each and every one of the requirements that has an effect on this. Most of the POCs dont have a SDN or SDS strategy.
- Business Representation: Thanks to my friend Grant Orchard (@grantorchard) for this point. Most of the POCs are done for the IT folk not including the business. Without the involvment of business units and more importantly business owners, a cloud model doesnt work.
So there we are, a few reasons why a cloud POC should not be productionised. I am happy to get any feedback on why any one of these is not a good reasons or what other important reasons I’ve missed.
I have been seriously thinking and prepping for #VCDX-Cloud. It couldn’t have been more different to think about CMA than when I was starting with prep for VCDX-DCV.
Having said that, came across an interesting discussion on Twitter yesterday where a few guys I know and a few I know on Twitter (you all know who you are) were discussing which VCDX stream should one be focusing on right now.
Let me clarify that last sentence, since a lot of the current VCDXs are DCV with a few CMA / EUC / NSX doubles, somehow DCV might be considered not as tough as CMA. Well that’s what I got from the conversation anyway, I was wrong in where the discussion was heading (Thanks @grantorchard for pointing it out).
But nonetheless got me thinking about how one should decide whether VCDX-CMA is the way to go or is VCDX-DCV still a good stepping stone. Looking at the blueprints of both the VCDX Streams, its hard to point out the difference in what is required. Yes I mean CMA has to have a cloud component for VCDX whereas DCV has to be vSphere. Everyone knows that. Lets hear it from someone who has done VCDX-CMA recently, here is Will Huber (@huberw) talking about what is different on his VCDX-CMA submission as opposed to the VCDX-DCV one, yet how the blueprint is still very vSphere centric.
But can we make a DCV submission a CMA submission just by whacking the cloud component on top ? I don’t think so. You can defend a cloud design for DCV but don’t think it’d fly if you ‘include’ cloud components into a DCV design and try to defend it (if you get invited that is). In my opinion VCDX submission is all about the focus of the Architecture Document and meeting the business/technical requirements with the proposed design. A virtualisation design requirements will definitely be wayyy different to the requirements for a cloud design. Mainly because the focus of the project is completely different.
For a cloud design (in addition to the usual vSphere design), one needs to consider the following:
- How is the cloud aspect going to affect network and security of the design?
- How is the cloud aspect going to affect how RBAC is designed and managed ?
- How do we ensure that the automation aspect of it is secure and more importantly adheres to the business requirements?
- How do we do the capacity forecast ? Based on resource consumption or based on business requirements?
- How are RTO / RPO going to be affected (Cloud is meant to be always on remember)
- What about backups? How do we ensure Backup is provisioned seamlessly to the user?
- What about provisioning operations? Who has what rights on the VM?
- How do we define the processes which aid the lifecycle of service strategy, design, transition, operation and improvements
- How do we manage upper layer presentations of the service catalog and the accountability and chargeback of cloud resources ?
- How do we build an extensible orchestration design where in other components can be plugged in?
- What about the ITIL/ITSM processes for the environment ?
- What about the end user training and enablement to use the new environment?
- What other criteria needs to be considered when looking at PaaS or SaaS or XaaS ?
- How do we report on usage ?
See where I am going with this. With a DCV design, its all about the VMs, how we enable the management of VMs and performance of VMs without affecting other VMs. For a cloud design, its all about the users and the ‘apps’ that they use. It still about the VMs for the administrator but not for the end user. The EU couldn’t care less if his ‘application’ was running in cloud or on a ‘PC Server’ in the DC, he wants it to perform. My take is you can’t have a VCDX-CMA design without a very robust VCDX-DCV backbone.
I have always maintained that at the end of the day VCDX is just a “vendor cert”, it doesn’t give you knowledge or make you more intelligent. What the VCDX journey taught me is to focus on what the solution is going to achieve, not what the solution is. I still work in the same way, speak to people the same way, but yeah I think differently when asked a question.
VCDX is only about validation of your VMware <insert stream>knowledge. It doesn’t make you an architect or make you a better one if you are already an architect. I know plenty of enterprise architects who aren’t VCDX certified and they are still very good. Knowledge should be the pursuit of whatever certs you go for. The moment you put a tag (with make, model and price) on knowledge it becomes useless.
PS: The last statement also applies to another twitter comment, where one thought that his VMware Certs are ‘not as useful anymore’ in pursuing NPX. Guess what, #NPX is also ‘just another vendor cert’. It validates that you have knowledge in designing and administering very complex Nutanix / HCI environments (albeit on multiple hypervisors). Period. Nothing else. Either of these certs (VCDX or NPX) still wouldn’t warrant being used or even be a requirement as an architect if one is looking at using IBM Cloud, HP Helion , AWS, OpenStack or Google ComputeEngine. Knowledge about any of these + NPX + VCDX would give you a definite advantage in deciding “what meets your business requirements best “..
I recently had a discussion with a couple of my colleagues while working on a PaaS / IaaS EHC project recently. We were trying to come up with test cases for testing PaaS capability for Applications. So while discussing that I figured I would blog about the testing process and the kind of model that a company needs to adhere to while testing. Do you test all aspects of the solution including the infrastructure and middle ware layer? or do you test just the application layer? The answer is that it depends on the platform you are going to use. Are you are using a publicly available PaaS platform like AppEngine, Azure, AWS Beanstalk, or whether you want to have to have the development capability hosted internally in a private / hybrid cloud. The testing process for both of them different quite significantly. Or Does it? Bear with me..
PaaS has a major benefit for production deployment. Even with dev,test,stage and pre-prod, the smallest of changes can have a major impact and tend to cause big outages, if not tested completely. So to ensure that the PaaS platform is robust, we need to ensure that all the infrastructure and processes in the Hybrid Cloud environment are able to ensure that robustness.
Any PaaS platform needs to demonstrate the following capabilities
- Operational Visibility and Control
Now everyone has seen the different ‘XaaS’ cloud model, so I am not going to repeat that. I mean well.. here it is
Now lets look at who provides the capability for each of the characteristics mentioned above the XaaS picture
- Operational Visibility and Control – VMware + Pivotal CF
- Security – RSA + VMware + Pivotal CF
- Services – VCE + EMC + VMware
- Automation – VMware + Pivotal CF
- Portability – VMware + Pivotal CF
So when the client wants to test the PaaS applications, doesn’t that mean that all other aspects of the platform have to be tested as well ? Absolutely.
So when you are testing a platform for Hybrid Cloud PaaS, you not only need to ensure that the application is able to scale out as and when required, your infrastructure services and the middleware services are able to support that scale out model. This is where EHC has its strength, with pre-tested stack of software and hardware entities,we ensure that when a client deploys EHC Platform for PaaS, the only testing they need to worry about testing the application they’re deploying.
The figure below shows the architecture of the EMC EHC for Pivotal CF ( Cloud Native Applications)
With EHC, we test all the components that make up the bulk of the IaaS and PaaS layer. Being part of the same federation with companies like VMware, RSA, Pivotal and VCE, there are a lot of smart people who develop and ensure that the components of the solution work well together.
Pivotal CF Elastic Runtime, for example, delivers a dynamic & highly available platform to enable application owners to easily push, scale, and maintain applications over their lifecycle. The picture below shows the main components of the Pivotal CF Elastic Runtime Environment.
There is so much to learn with the new product offerings that we have with EHC especially in the PaaS and Big Data Implementations. This blog is just an initiative for me to keep learning and improve my knowledge beyond ‘infrastructure’.
Thanks for reading.
Returning to the blog post series about EHC, this blog will cover the EHC Federation Edition with Disaster Recovery. In this post, we will be looking at the intergration between VMware SRM and vCAC. Yes that’s right. Some of you might be unaware that vCAC or vRA is supported for DR by VMware SRM. Let us look at the conceptual diagram how this would work.
Note: This blog doesn’t discuss the DR/HA Availability of vCAC. That will be discussed in subsequent posts.
We will need to create 2 vCenter endpoints in vRA. This will ensure that both the protected and the recovery site VMs can be managed by vCAC instance. So lets go ahead and create the 2 endpoints in vRA. Note that both the vCenters have been named DR for vCAC controller environments are supported in 3 ways :
- Protected Mode
- Recovery Mode
- Test Mode.
These are not dis-similar to the native protected methods in VMware SRM. Most of the workload migration and restart of the VMs is automatic as it is natively done by SRM. However there are a few manual steps that are required to ensure that vCAC can monitor and manage the “recovered” workloads. Lets now look at the behaviour of vRA during various SRM Phases.
Now lets look at an example of failover and the steps that are required to ensure the vRA is still in sync after failover / migration is finished.
Once the planned migration or disaster recovery steps are finished inside SRM, check vRA appliance for any errors. You will see a similar error on all the VMs managed by vRA.
To remediate this error, the following steps have to be done:
Turn off Automatic Data Collection for Site A vCenter (under compute resources).
Run Data Collection Manually for Site B vCenter (under compute resources).
Perform manual failover for the VM on Site A as shown below
Go to Infrastrucuture – Machines – Managed Machines.
Hover over the VM and select Change Reservation.
Once the pane changes to the reservation, change the appropriate values as shown below ( these are the values in the lab environment and will be different to your actual values)
Note: If the blueprint is shared, that value doesnt need to be changed.
Once these steps are completed, turn data collection on for Site A vCenter.
This can be automated using the tool below https://developercenter.vmware.com/web/dp/tool/cloudclient/3.2.0. Thanks to Ben Meadowcroft (@benmeadowcroft) for bringing it to my knowledge.
For more information on vCenter/SRM/VCO/vRA integration visit the links below
- ManagedBy property https://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.wssdk.apiref.doc/vim.vm.ConfigSpec.html#managedBy
- DR for vCloudDirector/vCenter Page 5 of http://www.vmware.com/files/pdf/techpaper/VMware-vCloud-Directore-Infrastructure-resiliency-whitepaper.pdf
- DR for View Page 7 of https://www.vmware.com/files/pdf/techpaper/vmware-view-vcenter-site-recovery-manager-disaster-recovery.pdf
- SRM Events on recovery http://pubs.vmware.com/srm-55/topic/com.vmware.srm.admin.doc/GUID-B62ACB9E-955B-4499-900D-38F2D7FED1E0.html
- SRM public API https://www.vmware.com/support/developer/srm-api/srm_50_api.pdf
- SRM limitations (linked clones) http://pubs.vmware.com/srm-55/topic/com.vmware.srm.admin.doc/GUID-084C089D-9689-4F34-9A75-8AFB980A725E.html