Multi-tier applications: OneFlow

A framework to build and manage elastic services, made up of many VMs, organised on different layers. A detailed explanation and a working example is provided.

In OpenNebula it is possible to define multi-tier applications, i.e., a collection of virtual machines that is seen and managed as a single entity (or service). Within a service, a user can define a hierarchy (which VM should start first), elasticity rules (how to scale out the infrastructure) and send cumulative commands to the member VMs.

Important notes: before starting with the description a couple of important remarks:

  • OneFlow works together with OneGate, please be sure to have read and understood how this facility works;
  • OneFlow can also remove (shutdown) VMs, as detailed later. The criterion is very easy: the oldest one (i.e., the lowest numerical ID) is chosen. This could sound arbitrary, but it is reasonable, all VMs belonging to a role should be interchangeable (they are like snow flakes), none should stand out. If this is the case, then the VM(s) with a particular task should be assigned to a different role.

The OpenNebula component providing all of this is OneFlow, accessible from the main menu of the ONE's GUI

OneFlow in the main menu

where there are two items:

  • Services: the multi-tier application(s) currently running or instantiated;
  • Templates: the description of the service(s).

In the next sections we will explain how to set up a template and instantiate it. The example we consider is an application made up of two VM classes: a frontend and multiple backends. The service envisages only one frontend and up to 5 backends, to be started (or shut down) dynamically depending on the workload.

Define a service template

A service template contains

  • Roles: the VM's type or the VM's task. A role is associated to a VM template, that is, VMs with the same role are launched using the same template (picked from the list of templates accessible to the user);
  • Hierarchy: the dependencies between roles, i.e., which VMs should be started first;
  • Elasticity rule(s): the criteria(s) to start new VMs (scale out) or shrink the infrastructure

Clicking on OneFlow > Templates from the side menu, the list of service templates opens.

OneFlow template menu

Click on the green plus (+) button on the top left to define a new template. First information to enter is the name, as in the following picture (green box), plus an eventual description.

OneFlow template for the FE

We can right away define the first role, Frontend, filling the tab under Roles:

  • Role Name: a label to identify the role, i.e. Frontend in our case;
  • VMs: the number of VMs to start with when deploying the template;
  • VM template: the template of the VM to use for this role, to be picked from the list of VM templates accessible to the current user.

Since in our example the frontend does not need to scale, the Role Elasticity and Advanced Role Parameters can be ignored for now, going on to define the basic configuration of the backend VMs. To do that, just click on the Add another role button (red square).

The initial steps are the same we already saw for the frontend: decide on a name (Backend), select the number of VMs to start with (again just one) and pick a template for this role. Since this is the second role which is defined, it is also possible to introduce dependencies. This can also be seen as a hierarchy of roles, i.e., the order to start the different VMs with. In fact, a new checkbox is available, Parent roles, listing all other roles available (see the green box in the following picture).

OneFlow template for the BE

Since here we're defining the backend, it is reasonable to assume that such a VM should be started after the frontend. For this reason, the parent role Frontend should be checked in the Backend definition. The behaviour of a dependency can be further detailed in the Advanced Service Parameters (yellow box in the previous picture, detailed in the next one).

OneFlow Advanced Service Parameters

Here we have 3 items to configure:

  • Strategy: wether the dependencies defined are respected (the default value Straight) or the VMs are started regardless of the role (value None);
  • Shutdown action: what to do when a VM has to be removed from the service. By default Shutdown is used, but it is possible to switch to the hard version of the command (i.e., not sending the ACPI signal);
  • Wait for VMs to report that they are READY: instead of starting the VM when the parent reaches the running state, OneFlow checks OneGate for the parameter READY, which has to be set to YES by the user (or by some script).

Backend VMs should be elastic, OneFlow should be able to add or remove VMs belonging to this role in an automatic way, according to given criteria. These are specified by means of the Role Elasticity sub-menu, in the red box of the above picture.

The Role Elasticity is the core of a service, it defines the logic to scale out the application. For example, our approach could consist of adding one more backend VM (till the maximum number of running instances, 5, has been reached) every time the measured load is greater than 100 for 5 minutes. On the other hand, when the load is below 10 for 5 minutes, one VM is removed. The following picture shows how this is implemented.

OneFlow Role Elasticity

Let's start to review the parameters that apply to all rulesets.

  • Min VMs: the minimum number of VMs that should be always present for this role (usually 1);
  • Max VMs: the maximum number of VMs allowed for this role. This is the maximum capacity;
  • Cooldown: after an elasticity policy (see later on) is triggered, OneFlow stops evaluating the rule(s) for a certain amount of time, defined as Cooldown. By default it is 300 seconds (5 minutes), but it can be set here for all the rules specified for this role. The reason to have a cooldown interval is to temper the effect of transients, avoiding to trigger unnecessary actions and giving enough time to the control parameter to adjust itself to the new infrastructure (i.e., more VMs to share the load with). If a specific rule specifies its Cooldown period, then the general (this) and the default ones are overridden.

The rest of the form gives the possibility to enter the rules that OneFlow should apply to the service. They can be classified as:

  • Elasticity policies: they are event-triggered, a parameter is monitored and if a certain threshold is met, then OneFlow takes a certain action (deploy a new VM or shutdown an old one);
  • Scheduled policies: the action described here (deploy a new VM or shutdown an old one) is time-triggered, i.e., carried out at a certain moment or at regular intervals.

In more details, the fields that should be filled to define an Elasticity policy are:

  • Type: the action that should be carried out by OneFlow when this particular rule is triggered. This parameter works together with the Adjust box (detailed later on) since the latter provides the magnitude. In other words, an action consist of Type and Adjust. The user can choose among 3 different values for Type:
    • Change: add or remove the number of VMs specified in the Adjust box;
    • Cardinality: set the total number of VMs to what is specified in the Adjust box;
    • Percentage: add or remove VMs, interpreting Adjust as a percentage, to be applied to the current total number of VMs. The minimum adjustment that is applied is contained in the input box Min.
  • Adjust: the numerical parameter (or magnitude) that defines an action together with the Type. It has to be an integer and its meaning varies according to the value of Type:
    • Change: the number of VMs that should be added or removed. In case of removal, just specify a negative integer;
    • Cardinality: the total number of VMs that should be present as a result of the triggered rule. In this case only positive numbers are meaningful;
    • Percentage: the VMs that should be added or removed, expressed as a percentage of the current total number of VMs. It should be a positive integer number, less than or equal to 100. In case of removal, just specify a negative number. The minimum adjustment that is applied is contained in the input box Min.
  • Min: in case the action Type is Percentage, OneFlow deploys or shuts down at least this minimum number of VM. In case the percentage (with respect to the current total number of running VM for this service) contained in Adjust is below this threshold, the number is rounded up to Min;
  • Expression: the condition that triggers the current rule. It is simply an expression (>, <, ==, >=, <=) that involves a parameter and a numerical threshold. The parameter value should be updated by all the VMs belonging to this role via OneGate. OneFlow reads the same facility and it executes the rule if at least one of the VMs in this role satisfies the condition. The same rule is not evaluated anymore for a period of time equal to the Cooldown period (in the following order: the specific one for this rule, the general one for this service template, the default one). In our example, the expression that triggers the deployment of a new VM is LOAD > 100, while the shut down of a VM happens when LOAD < 10;
  • #: the number of Periods during which the Expression should be satisfied in order to trigger the current rule;
  • Period: the time interval that is used by OneFlow to check the value of the Expression. It is a sort of sampling time. The rule is triggered if the Expression evaluates to true for a contiguous number of Periods equal to what specified in #.
  • Cooldown: as explained before, this is an inactivity period after a rule is triggered to temper the effect of transients. This value overwrites the default and what is specified at the beginning of the form for the current role. Of course it should be greater than the observation time window required to trigger the rule.

Regarding the Scheduled policies, the interface proposes the following fields:

    • Type: same meaning as Elasticity policy;
    • Adjust: same meaning as Elasticity policy;
    • Time format: it works together with Time expression and it specifies the type of the Scheduled policy. This is a drop-down list consisting of two possible entries:
      • Start time: the action should be executed at a particular time;
      • Recurrence: the action should be performed at regular intervals.
    • Time expression: it defines when or how regularly the action should be performed. The content depends on the value of Time format:
      • Start time: a fixed point in time, in the format YYYY-MM-DD hh:mm:ss, where
        • YYYY is the year (4 digits);
        • MM is the month (2 digits prepended by 0 if needed, i.e., 01-12);
        • DD is the day (2 digits prepended by 0 if needed, i.e., 01-31);
        • hh is the hour (2 digits, in 24 hours format, prepended with 0 if needed, i.e., 00-23);
        • mm is the minute (2 digits, prepended with 0 if needed, i.e., 00-59);
        • ss is the second (2 digits, prepended with 0 if needed, i.e., 00-59);

        When hour, minute, or second is not specified, it defaults to 00.

      • Recurrence: a repetition criteria expressed using the Linux crontab syntax. Five fields (or columns) are needed, space separated, following the order:
        • minute: valid values are 0-59;
        • hour: from 0 to 23;
        • day of the month: from 1 to 31;
        • month: from 1 to 12 or a string with the first 3 letters of the English name of the month, i.e., Jan ... Dec (case insensitive);
        • day of the week:, valid values are 0-7 where 0 and 7 correspond to Sunday. Also the first 3 letters of the English name of the day will work, case insensitive;

When the value of a field is not meaningful, please do not skip the column, use the character * instead. It has the function of a wildcard and it means for each. For example, to execute a policy every first day of the month at 8 am, use the string 0 8 1 * *. Please note the * in the fourth field (dedicated to the month), which is a placeholder for each month.

In each field it is possible to define numerical ranges using a hyphen (-). The boundaries of the range are included.

Going back to our example, the rule deploy a VM if the load is greater than 100 for 5 minutes (300 seconds) is implemented in the following way:

  • an elasticity policy is needed, rather than a scheduled one
  • Type is set to Change, since we want to deploy a VM each time an event happens;
  • Adjust is 1, since this is our basic step;
  • the Expression is LOAD > 100, where LOAD is a variable stored in OneGate and managed through the same facility by the VMs belonging to the Backend role;
  • the timespan during which the Expression should hold is 5 minutes, that we split in 5 intervals (#) of 60 seconds each (Period). This means OneFlow samples OneGate every minute to check the value of the variable, triggering the event if the condition is satisfied 5 times. In this way it is easier to see the progression of the event at runtime (explained later).

The other rule, shrinking the application, is implemented in the same way. The only difference is the Adjust parameter, which is negative, meaning remove the given number of VMs. As mentioned in the beginning, there is only one criterion, the oldest VMs in the role are picked.

Run a service

It is finally time to launch the service just defined. Choose the corresponding template from the list and click on the Instantiate button on the top right corner (bordered in green).

OneFlow template list view

A dialogue box will open:

OneFlow instantiate dialogue box

It is possible to assign a name to the instance and decide how many instances to deploy. The latter is the number of applications (the sum of the (minimal) number of VMs defined for each role) and not the number of VMs.

In order to view the service instance, click on OneFlow -> Service entry from the main OpenNebula menu on the left

OneFlow lateral menu

and the list of services belonging (or visible) to the current user will appear, together with their status. Please note that this is the status of the service and not that of the VM(s) deployed by the orchestrator. Though the idea is similar, the list of service statuses is shorter:

  • Pending: waiting to find proper resources for the VM(s) in the template;
  • Deploying: instantiating the VM(s);
  • Running: all VMs are up
  • Undeploying: this is the status during the service shutdown operation (the VMs are being shutdown);
  • Warning: some of the VMs encountered an error;
  • Done: terminated successfully. The service can't be resumed, it has to be started again from its template;
  • Failed (deploying, scaling, undeploying): an operation terminated with an error;
  • Scaling: a rule has been triggered, modifying the number of VMs of the application;
  • Cooldown: inactivity period without rule evaluation since an event has been triggered.

OneFlow Services overview

On the top right corner there are 3 buttons:

  • Shutdown: all the VMs belonging to the service will be shutdown, according to the action chosen in the template, that is Shutdown (default) or Shutdown hard. This is valid only when the service status is either Running or Warning. The final status of the service is Done;
  • Recover: depending on the status of the service, this button performs the actions detailed in the table at the end of this list.
  • Delete (the red trash icon): delete the service and all the associated VMs;
Service statusAction of the Recover button
Failed deploying Continue deploying the service
Failed scaling Continue scaling the service
Failed undeploying Continue the shutdown of the service
Cooldown Ignore the cooldown period and evaluate again the elasticity rule(s)
Warning Delete failed VMs and instantiate new ones

Clicking on the service, more details are shown. The first tab of the Service detailed view is named Info.

OneFlow Service Info tab

It offers some basic information and a grid to change the permissions of the service.

The Roles tab (in green) is more interesting.

OneFlow Service Roles tab

First of all, there is a list of the roles defined in the service template, their status, the number of VMs currently assigned (Cardinality) and the parent role(s), if any.

Clicking on a role, the bottom part of the screen is filled with the VMs belonging to that role.

OneFlow Service Role details tab

The identifier and the status of each single VM is reported, together with the IP and the VNC icon. Once a VM is selected, the usual operations (suspend, power off, stop, undeploy, ...) can be applied by means of the toolbar highlighted in blue.

Going back to the top part, when a role is selected the toolbar bordered in yellow becomes available. It is used to apply a certain action to all VMs belonging to the specific role. This behaviour can be trimmed by means of the input field:

  • Period: the time interval, in seconds, between two consecutive operations;
  • Number: the selected action will be applied only to the specified number of VMs belonging to the selected role.

Also the green button on the upper left corner +Scale is active. It is easy to guess that it is used to scale (out or in) the highlighted role manually. After clicking on it, the following window opens:

OneFlow Service Role manual scale

The user has to enter the desired Cardinality, that is, the desired number of VMs for that particular role that should be available at the end of the scale operation. The option Force is used to ignore the minimum and maximum number of VMs allowed for the role entered when setting up the elasticity rule(s) in the service template.

Of course, the most interesting part of OneFlow is the auto scaling capability. The Backend role has an elasticity rule that forces the deployment of a new VM when the LOAD parameter is greater than 100 for 5 periods. It is possible to do that connecting to a VM with the backend role and, according to what's explained in the OneGate's example type the following: curl -X "PUT" "${ONEGATE_ENDPOINT}/vm" --header "X-ONEGATE-TOKEN: $ONEGATE_TOKEN" --header "X-ONEGATE-VMID: $VMID" -d "LOAD = 120", after the proper setup. As shown in the following picture (green highlight box), OneFlow detects the change, shows the current value of the LOAD parameter in OneGate (120) and starts to count the number of periods in which the condition holds (in the picture, 2 out of 5, as set in the template).

OneFlow Service Role periods counter

If the condition holds for the predetermined number of periods, then the application scales. We can see that the status of the service, for the Backend role switches to Scaling (green box, next picture) and a new VM is instantiated (yellow box) always as a Backend. The variable is always at 120 (blue box).

OneFlow Service Role auto scaling

Once the new VM is running (yellow mark, following picture), the service status is Cooldown (green box).

OneFlow Service Role cool down

The orchestrator will stop evaluating the rules for a certain interval of time. In the meanwhile, it is wise to lower the value of the LOAD variable, in order not to trigger another VM. Something like curl -X "PUT" "${ONEGATE_ENDPOINT}/vm" --header "X-ONEGATE-TOKEN: $ONEGATE_TOKEN" --header "X-ONEGATE-VMID: $VMID" -d "LOAD = 20" on the first Backend VM will work. A value lower than 10 will trigger the removal of a Backend VM.

The update of the variable connected to the elasticity rule(s) should be taken care of automatically, by means of a script, for example. Remember that, if the Backend role is made up of more than one VM (and all of them update the variable), then it is enough that one of them satisfies the rule to trigger the corresponding event.

A legit question would be, what happens if both rules are triggered. The one that is evaluated first is triggered, then there is the cooldown period. The cooldown plays an essential role here. As already mentioned, it should be long enough (longer than the hold time of the event's condition) to eliminate transients. During the cooldown, the application should be able to adjust itself, re-distribute the load and return to a normal function. If this is not the case, then it is scaled again. If the event trigger (LOAD in our example) is meaningful, it is updated automatically and timely, plus it is read at sensible intervals (see the Period parameter of the elasticity policy), then the application will not behave too erratically. Everything is a matter of balance, so we invite the users to experiment in order to find the right trade-off (and to contact us for any question).