Tech Field Day 14 Preview – Datrium

Most of my peers know that I’m not one to dig deep into many nuances of storage.  I consider storage to be a necessary evil in which there will always be a surly storage administrator to give me the runaround on why a virtual workload isn’t performing up to expectations.  However, in an effort to become a more well rounded technologist, I have decided to make my first Tech Field Day 14 preview post specifically about a storage company.  During the event, which will happen on May 11th and 12th of 2017 (in Boston), we will hear much more about a storage company that appears to be doing things differently than most of their storage brethren.

The Architecture

I’ll reference, again, that I’m not much of a storage expert.  In fact, in the right moment, you might be able to sneak by me made up storage terms (say, triple crunch parity).  That being said, one of the things about Datrium’s architecture that stood out to me was the disaggregated approach of their architecture.  While I may have seen various attempts at this approach, I’ve not seen it to the level Datrium has taken it to.

Datrium’s storage solution, called DVX Rackscale, is a combination of a back-end storage array and a software element that is installed onto a VMware-based host.  As part of the requirements of that VMware-based host, there needs to be some amount (between one and ten disks) of flash storage installed into that host.  This local flash storage is used to store read-only, cached copies of more active data.  Connected to each host, via 10Gb Ethernet, is a single back-end disk array that contains the high-capacity, slower-performance disks in which the master copy of all data resides.  This back-end array is also the location where writes occur.  One of the interesting facets of this architecture is that it claims the more hosts you add to this solution, the faster performing the overall system becomes.  Datrium’s data sheets indicate performance estimates with 4K workloads to be roughly 100k IOPS per host, with a potential to drive nearly 3,000,000 IOPS in a maximum configuration.  Those numbers for 32K workloads are reduced (as expected) but still drive some impressive numbers (40k IOPs per host; 1.2 million IOPS across a maximum configuration).

There are some maximums that need to be dealt with in the architecture.  For instance, you can only connect 32 VMware ESXi hosts to a single back-end storage array.  Also, it appears that the back-end array is non-upgradable (the system has 12 – 4TB 7200rpm HDDs), meaning the current capacity you can be working with will vary between 60TB and 180TB.  Like many things storage related, this number is going to vary wildly based on the reduction techniques and their abilities within the system (Datrium expects between 2x and 6x reduction ratios, based of their datasheets).

An Industry First? – Blanket Encryption

External to the architecture of DVX are some very unique features that are not found on many other storage devices.  One such feature is called Blanket Encryption.  This feature, a completely software-based solution, manages to overcome some of the tradeoffs we see in current solutions.  Many hardware solutions only go as far as to offer at rest data encryption.  Combine in the penalties to reduction capabilities by encryption at the source and you’ve got a lot of wasted resources in trying to provide minimal protection capabilities.

Datrium’s DVX solution claims to be able to provide you the best of all worlds.  As they are dependent upon their own software in the hypervisor stack, they are capable of doing better source protection in the VMware host RAM and local flash disks that do not cause the same penalties seen by just the hypervisor itself.  The very same software is also capable of performing in-flight encryption from the host to the storage array.  According to Datrium’s datasheets, the data is first fingerprinted, then compressed, and finally encrypted while it’s being created in the ESXi host RAM.  Then the data is transmitted to the storage array (as ciphertext).  All the while, this is being done using some of Intel’s advanced instruction sets to help offload the performance hit to do all of these operations.

My Non-Storage Enthusiast Take

There is plenty more to look at, when it comes to Datrium’s DVX solution.  Recently added to the portfolio is a very unique replication system and snapshot catalog system.  This replication system is extensive enough to almost justify an entire write up to discuss it’s intricate details.  Combine this replication system with plans to work toward’s public cloud replication (while ensuring the public cloud data is able to take advantage of the same on-premises blanket encryption!) and we have a lot more to talk about with Datrium.  Personally, I’m excited to hear and see more about blanket encryption.  Also, it seems that Datrium has decided to get into the compute market with their own Intel-based servers, preconfigured for VMware ESXi and the DVX software.  It’ll be interesting to ask why the need to jump into that overly saturated hardware market.

Posted in Technical | Tagged , , | 1 Comment

Why I’m Heading to OpenStack Summit – An Architect Perspective

The other day I was presented with a question from someone in my office.  They had seen a LinkedIn update I had posted about being invited to Tech Field Day 14 and that I was going to OpenStack Summit.  I posted this shortly after arriving back from Seattle and a Microsoft hybrid cloud airlift event (focused on Azure Stack).  They found it interesting that we had corporate initiatives to eventually implement Microsoft Azure Stack (along with expanding the Cloud Service Provider practice within).  They wanted to know why I even wanted to go to OpenStack Summit.

I felt this was an interesting question to start writing about.  It made me realize that there are many in our industry that puts very definitive lines between various cloud platforms out there and that they can’t seem to fathom why anyone would want pay attention to all, rather than just go deep into one platform and live there for our technical eternities.

For me, the answer wasn’t a very simple answer.  I come at cloud platforms (and cloud, in general) as someone who tries to get the basics between the various camps and finds what is unique to either.  Like many parts of technology, I view these as different tools for different jobs.  While many of them provide bare basics (like virtual machines or virtual networks), many differ in their approach for how to interact with various audiences.

One of the primary reasons I wanted to go to OpenStack Summit came down to finding out who the primary audience of OpenStack.  I want to be able to ask questions to those on the technical side, as well as the business side.  One of the weird things I’ve noticed about myself, when selecting sessions for OpenStack Summit, is that I’m going to have plenty that are labeled as the “Cultural and Organizational Change”.  While the technology and the inner workings of all the OpenStack projects fascinates me, I want to see how organizations are using OpenStack and what they intend to change within.

Another reason, as I alluded to in the prior paragraph, is the technology within the ecosystem.  I know of many of the tried and true OpenStack projects (like Nova, Cinder, Keystone, Neutron, Glance, and Swift) but I haven’t had the opportunity to administer or work with OpenStack for longer than mild attempts in a very closed lab scenario.  I want to see how the other projects are coming along and see which ones might break into the core project realm, especially as business drivers dictate it.

That last statement leads into my most important reason why I wanted to attend OpenStack Summit.  We all hear about the business drivers for cloud adoption.  What I want to see and hear is about how businesses came to see how OpenStack can achieve their goals.  Those of us, especially from heavy virtualization backgrounds, have likely heard about the complexities of standing up OpenStack, especially on your own and in the early stages of using open sources software and in an enterprise environment.

In the end, when you peel back the nuts and bolts of the various cloud platforms out there, you come to realize they all tend to look the same.  What differs each of them is their intended target markets.  For instance,  Microsoft has been positioning Azure and Azure Stack as a hybrid answer to those still approaching Microsoft-centric environments, but also trying to go about micro-services architectures for cloud-native applications.  I want to see if OpenStack is trying to mature their message to enterprises, while reducing the supposed complexities that exist in standing up the platform from either source or from the ever expanding vendor and partner integrations that are out there.

So, to those in my organization that feel that I’m wasting my time going to OpenStack Summit, I’m going to leave you with some final thoughts.  Too often in this industry, especially at the operational (along with some of those that are architects in a technical subject matter expert sense) level, we get caught up in only going to events that seem to satisfy or maintain the subject matter expert parts of our brain.  My role is one that asks me to see many different perspectives before trying to solve business problems.  To do this, we have to go to events that may not make sense on the surface or even provide any tangible benefit to those business outcomes.  Sometimes, we go to these events purely out of curiosity and/or information gathering.  I don’t know about you, I’d rather be armed with information rather than be one of the uninformed.

Posted in Technical | Tagged , , | Leave a comment

Cisco UCS Director and the PowerShell Agent – Part 5

In this blog post, we will finally work our way up to what I’ve tried to accomplish with a combination of PowerShell, Cisco UCS Director’s northbound APIs, and the concept of parallel processing with workflows in UCS Director.

The Assumptions and Caveats

First, let’s talk about the assumptions for this PowerShell script and UCS Director workflow. My current network architecture is one that models itself after one of Cisco’s validated designs for data centers. This design has a network spine (provided by the Nexus 7000 switch line) and network leaves (provided by the Nexus 5000 switch lines). As part of the configuration of this design, Cisco FabricPath is enabled across the switching lines.

With the implementation of FabricPath, the process to create a simple L2 VLAN across each switch is relatively simple and lends itself to be easily capable of taking simple variables (mostly in the form of which switch the operation needs to be performed). The only main difference between the two switches that forces me to have to create two different VLAN creation scripts is due to the Nexus 7000 family. This only forces me to have to specify the VDC (virtual device context) before running the small script to create a VLAN and enable FabricPath mode on that VLAN.

Also, as part of my architecture and current usage of these switches is the fact that not all the L2 VLANs were created with Cisco UCS Director. In response to this, I created two different workflows. We’ll call them: Nexus List Add and Nexus List Rollback Setup. Nexus List Add is used when the L2 VLAN is detected to NOT already exist on the switch. This will run a block of NX-OS CLI that will create the VLAN based on a passed VLAN name and VLAN ID variable. From there, FabricPath mode is enabled and then we save the running-config to startup. Regarding Nexus List Rollback Setup, instead of going through and trying to create a VLAN that already exists on the switch, we register a rollback task with UCS Director (for consistent VLAN removal across the entire FabricPath domain) and set ownership of the VLAN in UCS Director for the customer in question. This forces UCS Director to know about the VLAN and which customer owns the VLAN in question.

One last caveat for the PowerShell environment on my Cisco PowerShell Agent servers. I’ll admit that I’ve been rather lazy since deploying these out originally and the PowerShell version on them is still version 4.0. This causes issues with large JSON responses by some of the UCS Director northbound APIs. In version 4.0, there was a maximum size of the response in which you could use the ConvertFrom-Json cmdlet. I was forced to use some borrowed code that adjusts some of the JSON parameters in the system for my runspace and then creates a custom PowerShell object. Unfortunately, this process adds a lot of overhead. I’ve recently found that by upgrading my environment to PowerShell 5.0, the issues go away and I can get away from the custom code to create the PoSH object.

The Code

The code is available at the following GitHub location: My Example Code

Declared Functions

I wrote two specific functions in my PoSH code to be reusable through this script. They are called Get-UCSDURIs and Connect-NestedPSSession. For reference, I’ve previously blogged about why I needed to create Connect-NestedPSSession (Using Invoke-WebRequest with the Cisco PowerShell Agent). Get-UCSDURIs was created so that I could go through my entire switching stack and generate all the specific URIs for calling the two Nexus workflows and filling in all the variables. Since I have 2 Nexus 7000s and 10 Nexus 5000s in this configuration, I need to generate a combination of 12 URIs to send to the UCS Director northbound API, userAPISubmitWorkflowServiceRequest.

In Get-UCSDURIs, I also do a quick lookup of the workflow inputs by using the northbound API userAPIGetWorkflowInputs. The reason for this is that even if the input names on the workflows are the same, in UCS Director, they are given a specific number on the variable name to make them unique (example below from one of the JSON returns on Nexus List Add).

Screen Shot 2017-03-14 at 2.31.23 PM

A total of three parameters are passed to the API when executing userAPISubmitWorkflowServiceRequest.  The first, param0, is a string of the name of the workflow you wish to execute.  The second is a lengthy string that’s passed as param1 in the userAPISubmitWorkflowServiceRequest URI.  Most of the code in Get-UCSDURIs focuses on creating this parameter.  Since all this information is put into the URI (and not using any sort of request body), I could not create this as a JSON literal.  I had to create this as a large string object, which is why the code looks the way it does with multiple uses double quotes and back quotes.  Lastly, we send a UCS Director service request ID (or SRID) to param2.  In my case, I usually send the SRID of -1.  This means that each workflow call is independent of themselves and does not register rollback tasks with the parent workflow.  I handle rollbacks in a different way later, since I also want to use parallel processing in removing everything that I created.

The Main Script Body

I start the main script body by passing in multiple arguments from UCS Director.  Many of these are specific to my environment, but the gist is that I need to pass in which of my site’s this needs to be created, what VLAN ID is being requested, what the requested VLAN name is going to be, which UCS Director group do I want to assign ownership to, and which parent UCSD SR ID I want to assign the rollback tasks too (however, in the prior paragraph, this is almost always -1, but I wanted the function to be usable for anything else too, not just this specific use case).

I’m also passing in specifics for the PowerShell Agent at the site in question (the PS Agent) and the username and password I want to use to initiate the nested remote PowerShell session using Connect-NestedPSSession.

With UCS Director and the northbound APIs, there is no Basic authorization header to work with.  UCS Director creates an API key for each login that is accessible through the GUI.  To be able to use that key, you need to store it into a hashtable in which one of the keys is labeled X-Cloupia-Request-Key.  I do this by creating an empty hashtable first, the using the Add method on that object and insert the key/value pair into the hashtable.  For the most part, this is the only thing required for the northbound APIs I’m using in this script.

After setting the header, I now have everything I need to start initiating northbound API calls to UCS Director.  Before I start initiating any specific workflow, I need to go through a couple of lists to check to see what needs to be worked with and to determine which subsequent workflow (Nexus List Add or Nexus List Rollback Setup).  I use the northbound API, userAPIGetTabularReport, to get these lists of networking equipment (MANAGED-NETWORK-ELEMENTS-T52) and the VLANs that have already been pulled from the switch inventories and in the UCS Director databases (VLANS-T52).

After running through these APIs and parsing through the responses (and filtering down to very specific information), we then begin the process of cross checking to see if the VLAN exists on the equipment.  Depending upon whether it exists or doesn’t AND the model type of the equipment being checked, the switch information gets placed into one of four lists.  Each is labeled with either a 5k or 7k string and whether it’s a rollback or not.  These four lists are then processed and the URIs are generated using the Get-UCSDURI function.  Lastly, all four URI string returns are smashed together into one large, comma-separated list which should contain, depending upon the site, either 10 lines or 12 lines of URIs to process.

We start the execution process, but taking our URI list and initiating each call using a foreach loop.  I store the SR ID into a hashtable that I can use for keeping an eye on the workflows I just initiated.  Now, to keep an eye on each one of these API requests, we utilize the northbound API, userAPIGetWorkflowStatus.  This returns a numerical code to tell us what the status of the workflow is.  If the status code returned is one in which the workflow has reached some sort of stop, I remove the SR ID from the hashtable.  Also, I put the SR ID into a string called $sr_list.  This list is going to be used to store all the workflow SR IDs and use it for rollback purposes.  I will also return this list to UCS Director.  Once all the SR IDs have been removed from the hashtable, my while loop will shut down and the script will finish.

The assumption is that all my API calls have completed (without critical stop) and that a Layer 2 VLAN has been pushed out, in a parallel processed way.

Parallel Processing

I haven’t mention much on this concept up to this point, having focused on the code instead.  The reason that I wanted to perform this in parallel is that each switch acts independently of each other.  While FabricPath requires you to create a VLAN on all the switches for traffic purposes, there’s no major error checking that forces you to have it all created, at once.

Taking this idea, you can see that instead of waiting in a long sequential order (switch1 -> switch2 -> … -> switch10), I can initiate all 10 requests, at the same time.  This is what I mean by parallel processing.  Before I created this script, my entire process of creating a L2 VLAN was taking upwards of 10-12 minutes.  In a cloud world, this seemed extremely high, especially for something as simple as a L2 VLAN.

After this implementation, I have execution times as low as three minutes at one site (has less VLANs to parse through) and as high as five minutes (due to 12 switches with 1500 VLANs on them; this is roughly around 175,000 lines of raw JSON information).  Oh, and I forgot to mention I also ADDED new functionality to the overall workflow (these efficiencies were just one part of the major fixes requested).  I added automatic additions of vSphere created distributed portgroups and adding those to a customer’s network policy in their virtual data center.  I also added the ability to go through UCS Central and find any customer service profiles and add the VLAN to the vNICs of each of those service profiles.  Included in all of this were also permissions granting to our VMware hosted environment for those distributed portgroups.  So, I added some major automated assumptions AND drastically cut the time of execution.

Conclusion

I hope that either you read through the entire series or at least got something new and useful out of this blog series.  I was amazed just how little was being published from either Cisco SMEs or from the community with regards to PowerShell and Cisco UCS Director.  I hope that you, the reader, have realized just how powerful the PowerShell capabilities can be and how easy you can extend systems with no direct support in Cisco UCS Director.

Posted in Technical | Tagged , , , , | 3 Comments

Cisco UCS Director and the PowerShell Agent – Part 4

In this blog post, we will be going over some advanced use cases of PowerShell with Cisco UCS Director and the PowerShell Agent.  We will go over a scenario in which we need multiple bits of data returned from a PowerShell script and how that can be handled with some custom tasks for parsing XML data and returned as UCS Director macros/variables.

Real World Use Case

In my own environment, I had one great use case that made me start leveraging more data returns from PowerShell scripts.  In my lab and in production, we run Cisco UCS Central to provide a centralized repository for many Cisco UCS specific constructs (like MAC pools, service profile templates, vNIC templates, etc).  As we grew to multiple data centers, we started to worry about major overlap problems with pools in each Cisco UCS Manager instance, so we decided to start using UCS Central to provide and divide these entities up from a global perspective.

Cisco UCS Director had included a certain number of tasks that Cisco themselves had authored.  Unfortunately, was with many out-of-the-box task implementations in UCS Director, they didn’t quite fit everything we needed to perform our specific processes when it came to building UCS devices for either our own virtualization environments or for bare metal for our customers.

My main use case came from a limitation in the code for the task for cloning a global service profile template.  After upgrading UCS Central from 1.2 to 1.3, this task started to return the value of “undefined” for values like the MAC addresses assigned to our NICs or the HBA WWPN information.  We found that there was now a delay that had to occur to properly pull this information from the cloned service profile and return it back to UCS Director.

PowerShell Saves the Day

As with most of the Cisco UCS Director out-of-the-box tasks, you are unable to see the Javascript/CloupiaScript code within.  This made it impossible to resolve the issue through the existing task (although, a TAC case was logged about the issue).  We resorted to recreating the main functionality in PowerShell using the Cisco UCS Central module (available in the Cisco UCS PowerTool Suite:  Cisco UCS PowerTool Suite).

The Code

A caveat before we continue.  This code was written well over a year ago, so some of the code within may have changed drastically as the UCS Central PowerShell module may have gone through some revisions from earlier iterations.  Also, you are going to notice that I hard coded the password to send to the Connect-UcsCentral cmdlet.  Ask any security person about this practice and you’ll likely get hit with any sort of random object from the size of an eraser to that of a city bus.

The Script

Import-Module CiscoUcsCentralPs

$ucsc_org = ($args[0].Split(";"))[1]   # Passing a UCS Director variable and getting the Central Org DN from it
$ucsc_account = ($args[0].Split(";"))[0]  # Passing a UCS Director variable and getting the UCS Central account (if multiples)
$uscs_gspt = ($args[1].Split(";"))[2]  # Passing a UCS Director variable and getting the Global Service Profile Template DN from it
$customer_id = $args[2]  # Passing a string for usage in creating the name of the service profile
$device_sid = $args[3]  # Passing a string for usage in creating the name of the service profile

$ucsc_username = "*Insert UserName to authenticate to Central*"
$ucsc_password = ConvertTo-SecureString -String "*Password for Central account*" -AsPlainText -Force
$ucsc_credential = New-Object -TypeName System.Management.Automation.PSCredential -ArgumentList $ucsc_username, $ucsc_password
$ucsc_conn = Connect-UcsCentral -Name -Credential $ucsc_credential

$gsp_name = $customer_id + "-" + $device_sid    # Create combined global service profile name
$new_gsp = Get-UcsCentralServiceProfile -Dn $uscs_gspt | Add-UcsCentralServiceProfileFromTemplate -NamePrefix $gsp_name -Count 1 -DestinationOrg (Get-UcsCentralOrg -Dn $ucsc_org) | Rename-UcsCentralServiceProfile -NewName $gsp_name   # Create GSP from template and rename to remove "1" from end of name

Start-Sleep 15   # Sleep for 15 seconds to allow for UCS Central to process global pool values into GSP
$new_gsp = Get-UcsCentralServiceProfile -Name $new_gsp.Name   # Reload the service profile

$ucsd = @{}   # Create our hashtable to store values

# Create the hashtable values for the various parts of the global service profile to be used by later UCS Director tasks

$ucsd["VNIC1_MAC"] = ($new_gsp | Get-UcsCentralVnic -Name ESX_Mgmt_A).Addr   # MAC for Mgmt NIC/PXE Boot NIC, named ESX_Mgmt_A
$ucsd["VNIC2_MAC"] = ($new_gsp | Get-UcsCentralVnic -Name ESX_Mgmt_B).Addr   # Secondary MAC for Mgmt NIC, named ESX_Mgmt_B
$ucsd["VHBA1_WWPN"] = ($new_gsp | Get-UcsCentralvHBA -Name vHBA1).Addr   # WWPN of vHBA1, used for zoning, named vHBA1
$ucsd["VHBA2_WWPN"] = ($new_gsp | Get-UcsCentralvHBA -Name vHBA2).Addr   # WWPN for vHBA2, used for zoning, named vHBA2
$ucsd["VHBA1_WWN"] = ($new_gsp | Get-UcsCentralvHBA -Name vHBA1).NodeAddr + ":" + ($new_gsp | Get-UcsCentralvHBA -Name vHBA1).Addr  # WWN used for EMC initiator creation for vHBA1
$ucsd["VHBA2_WWN"] = ($new_gsp | Get-UcsCentralvHBA -Name vHBA2).NodeAddr + ":" + ($new_gsp | Get-UcsCentralvHBA -Name vHBA2).Addr  # WWN used for EMC initiator creation for vHBA2
$ucsd["ServiceProfileIdentity"] =  $ucsc_account + ";" + $ucsc_org + ";" + $new_gsp.Dn   # UCS Central Service Profile Identity, in UCS Director variable format

return $ucsd   # Return hashtable to UCS Director for processing with custom task

From the beginning, you’ll notice that we must import the modules we wish to use.  The PowerShell agent does not have full access to things like Windows profiles or scripts to load these into the runtime environment for us.  You must declare all the modules you wish to use (and that are installed on the device in question) in all your PowerShell scripts you wish the PowerShell Agent to interact with!

Our next block of code is bringing in arguments in which we sent to the script in question.  At the end of my last blog post, I explained how we can use the system array of $args to pass arguments from Cisco UCS Director to our PowerShell scripts.  From the code, I’m passing in a total of four arguments, but I’m creating five PowerShell variables (all strings) from those arguments.

Now, some object types in Cisco UCS Director are formatted in certain ways.  Take $args[0] that I’m sending to this PowerShell script.  You can tell that by using the Split function and by specifying how to split the string, that it’s semi-colon delimited.  The format of the string (which I believe is to specify how UCS Director sees UCS Central organization objects) looks like this:  1;org-root/org-XXXXX.  UCS Central organization objects appear like this to specify the registered UCS Central instance ID in Director (or the “1” in this example) and the Cisco distinguished name in UCS Central (or DN) of the organization.  So, from one argument, we can get two variables for use by this script.

After our argument section of the script, we perform our operations to create a PSCredential object and use that object to log into UCS Central.  I have a line of code specific to my organizations naming convention in how to name the service profile next.  Follow that up with our second UCS Central specific cmdlet in Get-UcsCentralServiceProfile.  From here, we have examples of how some of the object passing can happen between the Cisco UCS Central cmdlets using the PowerShell pipeline.  This command is getting the global service profile template (we passed this in as an argument), cloning a global service profile template from it, and naming the global service profile template with our custom naming convention.

Now, the code that fixed the issues we were having with the Cisco UCS Director out-of-the-box task were the next couple of lines.  Start-Sleep allows for you to put a hard wait in the execution of the script and wait for background processes to occur.  Once we waited 15 seconds, we re-read the profile information we just created.  This changed all the variables that were listed as “undefined” to their proper values.

Returning the Information

The last part of this code focuses on using a PowerShell object called a hash table.  A hash table is an object containing many PowerShell dictionary objects.  A dictionary object made up of two properties:  a key (or a term in dictionary speak) and a value (or a definition in dictionary speak).  Using this knowledge, we can use a hash table to store multiple pieces of UCS Central global service profile information and give that back to UCS Director for parsing and processing.

You’ll see in the code that first we declare our hash table.  From there, we can declare our keys and start storing values into the table.  You’ll notice that some of the keys that I chose to put in this hash table are address values from the vNICs in the service profile or the vHBAs in the service profile.  Lastly, I also created a value, in the form of the key ServiceProfileIdentity that will be returned to UCS Director.  The value is semi-colon delimited and in the format, that UCS Director expects for a UCS Central Service Profile Identity object.

Lastly, we tell the script to return the contents of the hash table.  At this point, the PowerShell Agent will create the XML response and specifically list all the hash table contents within the XML response.  We need to utilize some XML parsing on the Cisco UCS Director side to then store these values as macros for other parts of our UCS Director workflows.

Parsing the Response

Ages ago, I found this great example out on the UCS Director Community site (UCS Director Community Site.  This laid the foundation for how to parse through the response and get the variables out that I needed to create macros for usage by UCS Director.

I downloaded the workflow example within the above URL and imported it into my UCS Director environment.  When I did this, it automatically exported the custom task for parsing.  I cloned a copy of that task and started to make some edits to make it my own.  This all can be found by navigating (as a system administrator) in UCS Director to Policies > Orchestration and then clicking on the Custom Workflow Tasks tab.

We can start to edit this custom task by selecting the cloned task and clicking on the Edit button in the bar above it.  I usually skip the Custom Task Information section and proceed to the next section, Custom Task Inputs.  In this section, you can see the following:

Screen Shot 2017-03-01 at 4.21.09 PM

The input expected is going to be called xml.  We will be passing the output from the Execute PowerShell task to this input.

Moving along to the next screen, this is where the customization begins.  Knowing what we have for our key value names coming from our PowerShell hash table, we can create outputs based on those very names.  Here’s what I put for my values:

Screen Shot 2017-03-01 at 4.23.29 PM

Everything coming out of this task is going to be a generic text value, except ServiceProfileIdentity.  The reason for this is that my workflow is going to have a task requiring it to be sent this object type to be able to perform the task against it.

We skip past the controller section, as we are not going to be performing any marshalling on this task.  That leads us to this script:

importPackage(com.cloupia.lib.util);
importPackage(java.util);

var xml = input.xml

// Try and parse the ... section

var objects_xml = XMLUtil.getValue("Objects", xml);

// Parse the objects list now (should also be a single section):

object_list = XMLUtil.getTag("Object",objects_xml.get(0));

// Parse the object_list to get properties:

property_list = XMLUtil.getTag("Property",object_list.get(0));

// PowerShell returns arrays weirdly to UCSD, alternating rows of keys/values

// Like this:

//   ip

//   192.168.100.1

//   server_name

//   New Server

//

// Store output in a HashMap:

var variable_map = new HashMap();

// Store previous keys in buffer:

var key_buffer = "";

// Loop through all values taking even as keys and odd as values:

for (i = 0; i < property_list.size(); i++) {

// Remove XML tags (can't seem to coax the XML library to do this for me!)

property_list.set(i, property_list.get(i).replaceAll("",""));

// Keys

if ((i % 2) == 0) {

key_buffer = property_list.get(i);

}

// Values

else {

variable_map.put(key_buffer, property_list.get(i));

}

}

// Match desired output to HashMap fields:

output.VNIC1_MAC = variable_map.get("VNIC1_MAC");
output.VNIC2_MAC = variable_map.get("VNIC2_MAC");
output.VHBA1_WWPN = variable_map.get("VHBA1_WWPN");
output.VHBA2_WWPN = variable_map.get("VHBA2_WWPN");
output.VHBA1_WWN = variable_map.get("VHBA1_WWN");
output.VHBA2_WWN = variable_map.get("VHBA2_WWN");
output.ServiceProfileIdentity = variable_map.get("ServiceProfileIdentity");

The section to focus on to get our outputs is the last lines.  The code parses through the XML return and creates a JavaScript version of a hash table (called a Hash Map) and then we can get the values out of the keys within.  By assigning those values to the output variables in the script, we are creating the last steps to pull that information in as UCS Director macros and thusly, able to pass the information onto other parts of our workflow!

You can see here that I can take the MAC of my primary NIC for OS deployment and assign it as the NIC to use for PXE job creation on the Bare Metal Agent server by passing in the macro that I created:

Screen Shot 2017-03-01 at 4.32.19 PM

To Be Continued

 The last part of this blog series will try to take the PowerShell Agent to another level by showing how you can use the PowerShell Agent service to perform northbound API calls to other systems OR even to UCS Director itself.  I’ll show examples of recent enhancements I did to my workflows to enable parallel processing and gain massive efficiencies to the overall time of execution for some of the tasks within my datacenter.

Posted in Technical | Tagged , , , , , , | Leave a comment

Observations on Blame Cultures and the S3 Outage

One would think that this was scripted the way it happened, but I can assure you, that was not the case.  I’m in the middle of reading a book (a really good book on blame cultures; I highly suggest a copy:  Here).  The day after I finished reading the book, my tech social media feeds were aflame with mentions of problems with AWS (specifically in the S3 service and in the US-East-1 region).  Much has been said about the need for proper application architecture using cloud building blocks and much reflection on whether the cost of this resiliency is worth a significant outage.  I fully expect that there’s plenty of discussions happening within organizations about these very factors.

I found myself not necessarily focused on the incident itself.  I was more interested, strangely enough, on any sort of public post-mortem that would be brought forth.  Having read many DevOps books recently, the concept of a public post-mortem is not new to me, but I can guess that for many private organizations, this could seem like a foreign concept.  When an incident occurs, many in the organization just want the incident to go away.  There’s an extreme negative connotation associated with incident and incident management in many organizations.  To me, post-mortems give me great insight into how an organization treats blame.

Recently, I’ve been doing quite a bit of research into how organizations, specifically IT organizations, deal with blame.  Now, in Amazon’s case, they’ve listed “human error” as a contributing cause to the outage.  What comes after this, in the post-mortem, goes to show how Amazon handles blame internally.  The two quotes, taken from the post-mortem (available here:  https://aws.amazon.com/message/41926/) are telling in how this was handled internally:

I’ve put in bold my key terminology of this event.  Notice that outside of one mention of the authorized S3 team member, every other mention has something to do with the tools to perform that action or in the process that would have helped to prevent the issue.  In this case, the root cause is NOT the operator that entered in the command, it was the process that lead to the input and the associated actions the system took based on the runbook operation.

At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

We are making several changes as a result of this operational event. While removal of capacity is a key operational practice, in this instance, the tool used allowed too much capacity to be removed too quickly. We have modified this tool to remove capacity more slowly and added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level. This will prevent an incorrect input from triggering a similar event in the future. We are also auditing our other operational tools to ensure we have similar safety checks

So, why the long-winded breakdown of the S3 post-mortem?  This got me thinking about all the organizations that I’ve worked for in the past and made me realize that when it comes to any sort of employment change, especially one that requires on-call or primary production system ownership, I’ve got a perfect question to ask of a potential employer.  Ask that employer about their last major internal incident.  While you might not get a full post-mortem, especially if the organization doesn’t believe in the benefit of such a document, key information about the incident and the handling of the incident should become immediately prevalent.  If the incident was a human error, ask about how the operator that performed the action was treated.

Unfortunately, in many IT organizations, the prevailing thought is that a root cause can easily be established that the operator was incapable of performing their role and immediate termination is a typical reaction to the event.  If not immediate termination, you can be rest assured that the organization will forever assign a hidden asterisk to your HR file and the incident will always be held against you.  Either way, this sort of thought process is what ends up causing more harm to the organization, long term.  Sure, you think you removed the “bad apple” from the mix, but there’s going to be collateral damage in the ranks of those that still must deal with the imperfect technical systems that still need their “care and feeding” to be able to function optimally.

Honestly, if this is the sort of response you get from a potential employer, I would end the interview at that time and no more discussions with that organization would take place.  Based on their response to the incident, you can easily see that:

  • The organization has no real sense or appreciation for the fact that the technical systems IT staff works with on a day-to-day basis are extremely complex
  • Those systems are also designed to be that where updating or changing the system is considered a mandatory operational requirement
  • When change occurs, you can never guarantee the desired outcome 100% of the time. Failure is inevitable.  All you can do is mitigate the damage failure can do to the system in question.
  • Reacting to the incident by putting the entire root cause on the operator is a knee-jerk reaction and occludes you from ever getting to the root cause(s) of your incident
  • By levying a punishing of termination onto the operator in question causes a ripple effect to the rest of the staff. The staff are now less likely to accurately report incident information, out of fear (of employment, being branded a “bad apple”).  This leads to obscuring root causes, which then leads, ultimately, more failures in the system.

Are you sure you want to work for an organization that puts its pride on “just enough analysis” and breeding a culture of self-interest and self-preservation?  No, me either.  Culture matters in an organization and with those seeking opportunities in that organization.  It’s best to figure out what the culture really is before realizing you made a major mistake working for an organization that loves to play the name/blame/shame game.

Posted in Technical | Tagged , | 1 Comment

Cisco UCS Director and the PowerShell Agent – Part 3

In this blog post, we will be discussing how we can use the Cisco UCS Director macros (also known as variables) from either workflow inputs or from task outputs.  We will also show how these macros can be passed as arguments to our PowerShell scripts through the PowerShell Agent.

UCS Director Macros

Cisco UCS Director uses a variable system to be able to use inputs or outputs of tasks in subsequent tasks.  Based on the Cisco documentation, they call these macro variables or “macros” for short.  Not only can these macros come from workflow tasks, but there is also a slew of system macros that are available.  The orchestration within UCS Director can allow for you to not only use these macros for workflow input and task outputs, but you can also use them for many virtual machine level annotations.

When you create a workflow, you can define the inputs that you wish to either be entered in by the person running the workflow or by defining admin level inputs that require no manual entry.  You can access this functionality by navigating an admin level session to UCS Director to Policies > Orchestration.  From the Workflows tab, you can click on the +Add Workflow button to create a brand-new workflow.

screen-shot-2017-02-20-at-2-54-12-pm

 

Above is the first screen given to you.  From here, you can set some of the workflow settings, like name, description, and context.  You can also select some default behaviors of the workflow, like whether you want to set default email notifications to the initiating user of the workflow.  For the sake of this example, we’ll just fill out the bare basics (Workflow Name, Folder to place the Workflow).  Keep in mind that a workflow name CANNOT be duplicated, regardless of folder placement.  You will need unique names for all workflows!  Click on Next to advance.

screen-shot-2017-02-20-at-2-58-00-pm

This next screen is where the workflow input magic happens.  By clicking on the + button below the “Associate to Activity” section, you will begin the process of adding a new workflow input to the workflow.

screen-shot-2017-02-20-at-2-59-15-pm

At a bare minimum, the only thing required for UCS Director to enable an input is to give it a label (extremely important for later!) and the input type.  The input type comes in handy when dealing with task inputs that require the input to be in a preformatted type.  We will show examples of this later.  For this post, we will just show creating a Generic Text Input type.

I put in a label of Test and clicked on the Select button.  From there, a listing of all input types is available to be searched through.  In the upper right hand corner of that screen, enter in “generic” and it will filter the listing and look like this:

screen-shot-2017-02-20-at-3-01-14-pm

Clicking on the checkbox for Generic Text Input will highlight the option.  If we click on the Select button, we will see that the workflow input should look like this:

screen-shot-2017-02-20-at-3-03-11-pm

You’ll notice there are a couple of other checkboxes now available.  The first is the Multiline/MultiValue Input checkbox.  This option will allow for you to common separate multiple inputs and can be extremely useful when processing multiple values with a task that can take it as a workflow.  Otherwise, you can process this list in a Start…End loop in the UCS Director Workflow Designer.  We will get into looping in a workflow in a future blog post.

The last option that is available is the Admin Input checkbox box.  By checking this box, the admin can either select the object from the UCS Director database or enter in a hardcoded value for this variable that cannot be changed.

If neither of these checkboxes is selected, upon executing the workflow, the person executing will be presented with fields in which they will have to enter in their own text string.  Clicking on the Submit button will place this macro into the screen and you can close out the original workflow creation by clicking on Next button to the User Outputs (this comes in handy once you start implementing the concept of Parent/Child workflows, covered in later blog posts).  Lastly, click on Submit to save the workflow.

Using the Macro in a Workflow Task

Now that we’ve registered a workflow input, we can pass it to any task that accepts a Generic Text Input type for input.  Open the Workflow Designer on this new workflow we’ve created.  Let’s drag a test task over to show this.  Since I like the Execute PowerShell Command task, I’ve dragged that over and have begun filling out the task and advanced the screens to the User Input Mapping section.

screen-shot-2017-02-20-at-3-11-13-pm

In this example, we can see that PowerShell Agent takes an input type of Generic Text Input.  You can click on the Map to User Input checkbox and the User Input drop down will have all Generic Text Input macros available from either workflow inputs or other task outputs.  Since we have no other task outputs right now, the only macro to choose from was our previously created Test macro.

We can also use this macro as an inline macro for a text field.  If we click on Next, we can advance to the Task Inputs screen.  You can put the value inline by referencing the macro in the following format:  ${<macro name>}

In this case, we will place ${Test} into one of the fields.

screen-shot-2017-02-20-at-3-15-55-pm

The Label field will now automatically use whatever value is input by workflow executor.

Passing Macro Information to PowerShell

 Now that we’ve shown that you do put the macro value into inline fields, we can use this information for passing arguments into PowerShell scripts.  From this same task, let’s say that there is a PowerShell script called “HelloWorld.ps1”.  I need to pass the Test macro to it for processing.  In the Command/Script field, I would put the following:

screen-shot-2017-02-20-at-3-19-06-pm

This is a very primitive way to pass arguments to a PowerShell script.  Inside my script, to use this value, you could easily store this string information with a single command using the $args array.  You could do this like so:

screen-shot-2017-02-20-at-3-22-39-pm

You can pass many more macros this way, just remember which position you put those arguments.  From there, you can take the information in those macros and perform any of the PowerShell options you have at your disposal.

To Be Continued…

In the next blog post, we will explore some more advanced techniques using PowerShell.  One of the use cases I’ve found to be highly unique is returning multiple values from a PowerShell script and storing them into multiple task outputs for future usage using PowerShell hash tables and UCS Director CloupiaScript XML parsing, in the form of a Custom Task.

Posted in Technical | Tagged , , , | Leave a comment

Cisco UCS Director and the PowerShell Agent – Part 2

In this blog post, we will be discussing how to utilize the Cisco PowerShell Agent and the provide Cisco UCS Director task, Execute PowerShell Command.  We will also go over what it’s going to take to parse the response of this task and retrieve information to be used as Cisco UCS Director variables for other tasks in our workflow.

Execute PowerShell Command

 First things first, we need to create a new workflow to begin using this task.  You can easily navigate to the workflow designer, by using the menu bars while logged in as an administrator.  Navigate to the following location:  Policies > Orchestration.  Make sure you are on the Workflows tab.  Create a new workflow from the menu options in these screens.

Once you’ve created the workflow, enter the Workflow Designer.  Along the left side of the window, you should be able to see what sort of tasks are available to be placed in the designer.  In the text entry field near the top, go ahead and enter the word “PowerShell”.  You will find the Cisco created task under the Cloupia Tasks > General Tasks folder.  Click on the task and drag it to the designer layout portion of the screen.  Once you’ve done that, double click on the task to begin editing the task.

You can proceed right through the section for User Input Mapping, as we don’t have any sort of inputs we are assigning to required values of the task.  Proceed to the “Task Inputs” section of the task edit process.  You should see something like this:

screen-shot-2017-02-10-at-3-42-29-pm

As you can see, I have already entered in some of the values for this task.  This looks very like what we had entered in the last blog post (Cisco UCS Director and the PowerShell Agent – Part 1).  The only major difference is that there is a PowerShell Agent selection box.  Populating this box is the different PowerShell agents we’ve registered with UCS Director.

One of the other major differences is that the screen has a rather lengthy scrollbar.  Using the scrollbar, we can see that there are some other entries that can be made.  For instance, you can perform a rollback of this task, in the form of calling upon another script.  This comes in handy for cleaning up whatever was added or changed in your environment.  As a good example, if you use PowerShell to perform operations in Cisco UCS Manager, when you rollback the workflow and remove those services, you would need to remove the changes you just made.  If you create a service profile and associated it with a blade server, you’d want to disassociate the service profile and delete it when that service is no longer necessary.

screen-shot-2017-02-10-at-3-45-46-pm

Other key parts of the task inputs include these last options:

screen-shot-2017-02-10-at-3-50-03-pm

The task has the ability for you to specify how you want to handle the task output.  Up until recently, the only output format that was available was XML.  Since UCS Director 6.0 was released, the option to return the output in JSON format was introduced.  The Depth option comes in handy for JSON format.  The last component is the Maximum Wait Time.  This is very important in determining how long you want UCS Director to keep an eye on this task before it automatically ends checking on the task.  Before setting up this task, it’s highly recommended to see exactly how long you expect this task to take and account for some extra time.

Lastly, pay attention to this final output variable:

screen-shot-2017-02-10-at-3-53-03-pm

When it comes to parsing the output of the script, this is the value we need to pass to a parsing task to retrieve information for other UCS Director tasks in our workflow.  Note that this comes back as UCS Director’s implementation of a generic text input object.

Parsing the Response

 As recently as UCS Director 6.0, a new Cisco-created task was included in UCS Director called Parse PowerShell Output.  This task is relatively decent at retrieving simple values from the returned text and creating a single UCS Director variable.  To work with the task, drag the task into the Workflow Designer.  Upon getting to the User Input Mapping section of the task, we need to map this value to the output of our Execute PowerShell Command task.  You should be able to find it in the drop down menu when you select that you want to map this object to user input.  It should look something like this:

screen-shot-2017-02-10-at-3-59-31-pm

In the output section, you’ll see the following values that should be available after processing the text we are giving to this task:

screen-shot-2017-02-10-at-4-01-45-pm

These variables will store parsed information from our PowerShell script and allow for us to use these values as inputs into other UCS Director tasks.

Caveats

If you’ve worked with PowerShell, you can easily see that this task seems to only have one set of key/value pairs defined.  If you are attempting to return many pieces of information, this is going to be a problem.  This is where some custom task authoring is going to come in handy.  I would highly suggest some examples from the UCS Director communities site (UCSD Workflow INDEX).  Armed with some of these older workflows, you can go through some of the CloupiaScript/JavaScript code to see how the XML return can be parsed and all values returned, especially if you are returning a PowerShell hashtable.

To be continued…

In the next blog post, we continue the discussion of how to send arguments to your PowerShell scripts…

Posted in Technical | Tagged , , , | Leave a comment