This article describes and tests the idea to use AWS Step Functions to create customer specific IVR logic. An IoT use case is used illustrate the customer need and the solution. Cost and architectural considerations are also covered.

Demo of IVR implemented using AWS Step Functions

IVR is an abbreviation of Interactive Voice Response which is the kind of system that automatically answer an incoming call and let you navigate through a set of choices before your call finally gets connected.

Typically an IVR flow is customer specific which means that if you run a multi-tenant/customer environment you must be able to define different navigation flows for different phone numbers and customers.

Modern application development based on microservices mandates an architecture with loosely coupled services that are independently deployable. Ideally these services are built upon serverless technology to ensure scalable and cost-efficient implementation.

AWS Step Functions is a service that enable coordination of execution of components in a distributed application. This is typically needed for a service that is composed of multiple microservices. It provides a serverless infrastructure and environment where workflows are defined end executed. An AWS Step Functions workflow is expressed as state machine diagram where tasks are executed in the different states. This as has a great fit with IVR services since an IVR flow is naturally described in a state machine.

It also seemed to be a architectural fit to use AWS Step Functions for IVR services. But would it provide sufficient response times to provide great user experience?

Another challenge when developing applications in a cloud environment is to understand the underlying cost dynamics. Would it be cost efficient?

The Use Case

The old traditional copper wired telephone networks are getting closed down as voice and Internet services evolve and get delivered via mobile or fiber networks. But there is a huge installed base of “Things” out there that have depended on the legacy copper network for communication.

For example, many residential alarm systems are connected to the telephone network. When the alarm is triggered it will dial a pre-programmed phone number and play a pre-recorded voice message. As the old fixed phone network is getting shut down this function stops working.

IOT Communication is working with a supplier of equipment that allow existing alarm systems to be retrofitted with mobile network connectivity. For many reasons it is more efficient to have this connectivity limited to data only and skip the voice service. That also allow other access technologies such as LoRaWAN or NB-IoT to be used for connectivity.

The old alarm system is connected to a device with mobile Internet access.  When the alarm is triggered the device use an API call back to the iotcomms.io platform. This API invokes an IVR service that place a call to the house owner which in turn acknowledges the alarm with a PIN code.

This allow the end user experience to be the same as before when an alarm is triggered even though there is no telephone connection anymore to the alarm system.

AWS Lambda vs. Step Functions

Another serverless alternative to AWS Step Functions would be to use AWS Lambda and implement the IVR flow as a separate function to be executed. This section compare these two options total cost and time to market perspective for development of IVR services.

Direct Cost comparison

Estimating costs when developing applications in the cloud is not obvious. Different services are priced and measured differently. Even though there may be multiple solutions to the same problem it is critical to pick the most beneficial solution from a total cost perspective. And in the total cost comparison it is not only the direct execution cost that matters, also time for development and operational overhead counts.

AWS Lambda let you run code in a serverless environment where you pay only for the execution time of the code rounded up to the nearest 100ms interval. The price per 100ms of execution time depends on how much memory your function is configured to use. There is also a per request cost to invoke the Lambda function. There is an upper execution time limit of 15 minutes for lambda functions.

AWS Step Functions is not priced upon execution time, it is charged per state change in the workflow. A Step Functions workflow have a maximum execution time of one year.

To save some zeroes in the cost calculations we are using µ$ (1/1000000 $). It is assumed that the lowest memory configuration of 128MB for Lambda is enough for the IVR workflow.  Below are the costs for Lambda and Step Functions:

Cost Lambda Step Functions
Per second execution 2.08 µ$  
Per invocation 0.2 µ$  
Per state change   25 µ$

Since the costs are not directly comparable, we must have some more parameters as input. The first parameter is the total duration of the IVR flow since this would have an impact on the Lambda cost. The second parameter is the complexity of the IVR dialog, the number of state changes.

The table below show the total direct cost comparison for a few different IVR scenarios:

Call Duration
(minutes)
State transitions
(number)
Lambda cost
(µ$)
Step functions cost
(µ$)
0.5 4 62,5 100
1 4 125 100
3 10 375 250
5 14 625 350

AWS Step function has a cost advantage when the total execution time is longer than one minute. The more idle time in the IVR flow the better is the case to use Step Functions.

A typical IVR flow with a few levels of interactions has an average call duration of 3 minutes and 10 state changes.  

So it seems like the use of step functions for IVR services is feasible. If the call duration is long in comparison to the number of stat changes it has a clear cost advantage.

What about the development and operational costs for the two alternatives?

Development and operational overhead

If you want to define the IVR logic as a Lambda function it requires developer skills in one of the programming languages supported by Lambda. It also requires custom code to log and track execution as well handling of timeouts, retry mechanisms and errors.

Step function workflows are defined in Amazon States Language and the AWS Console provides tools to generate and visualize the state machines. It also provides logs and visualization of state machine execution. One advantage using Step Functions is that the environment has built in support for error handling and timeouts in case a task is not executed properly, or an unexpected result is returned. If you implement the logic in in Lambda you must take care of this in code.

The latter is a major advantage for Step Functions opposed to Lambda. As said in the introduction the different IVR flown is likely to be customer and use case specific. Therefore the efficiency of development and troubleshooting the services are key. In fact the Amazon State Language is intuitive enough to let anyone mastering formulas in Excel to be able to develop and deploy production grade logic. This has a huge impact in time to market and offloading already busy developer teams by allowing customer facing staff to implement this logic.

Lambda or Step Functions?

Lambda is a great tool in the toolbox but both the cost comparison and the time to market assessments indicates that Step Functions is the winner.

It is only for very short-lived interactions where Lamba has a cost advantage. But in absolute dollars the total cost is very small and the cost overhead for 50 000 executions is in the order of the cost of an ice-cream. This is fairly large number of executions and also gives a hint that the time to market and operations dimension has a higher weight in the comparison.

Now we have enough justifications to go this path. But will it work in practice and provide a good user experience?

Architecture

Architecture overview

Even though there was a specific use case to be validated the architecture must be generic enough to allow multiple IVR flows to be deployed without modifications to the infrastructure.

Amazon API gateway is used to expose an IVRWorkflow API to trigger specific Step Functions workflows . The API takes the name of a workflow to be executed as a parameter as well the IVR flow specific parameters.

In the iotcomms.io platform there already existed media services functionality which allowed IVR flows to be defined in code. For some of the media services Amazon Polly is used to generate speech from text.

This media service layer was extended to implement AWS Step Functions Activity Workers. An Activity Worker is a remote task that is executed part of Step Functions workflow.

A new mediaService Step Functions Activity was implemented registered with AWS Step Functions to expose functionality provided by the media service. This acted as the interface between the media service logic and the Step Functions workflows. For scalability and reliability multiple Activity Workers are deployed.

Alarm Notification Step Functions Workflow

A Step Functions workflow was created to fulfil the required interaction flow for the alarm notification service. The workflow describes a state machine where each state executes a task  and the result of the execution is passed down to the next state for further execution until the workflow has completed.

Alarm notification Step Functions workflow

Above you see the visual representation of the IVR Step Functions workflow. The workflow was defined using the Amazon State Language:

{
 "Comment": "Alarm notification workflow.",
 "StartAt": "placeCall",
 "Version": "1.0",
 "TimeoutSeconds": 600,
 "States": {
  "placeCall": {
   "Type": "Task",
   "Resource": "arn:aws:states:eu-west-1:<AWSACCOUNT>:activity:mediaService",
   "Next": "playAndGetDTMF",
   "Parameters": {
    "command": "placeCall",
    
    "destinationNumber.$": "$.destinationNumber"
   }
   
   
  },
  "playAndGetDTMF": {
   "Type": "Parallel",
   "Next": "validateCode",
   "Branches": [{
    "StartAt": "playVoiceMessage",
    "States": {
     "playVoiceMessage": {
      "Type": "Task",
      "End": true,
      "Parameters": {
       "say": "The alarm has been started. Please confirm with your PIN code",
       "command": "playPrompt",
       "dialogId.$": "$.dialogId",
       "destinationNumber.$": "$.destinationNumber"
      },
      
      "Resource": "arn:aws:states:eu-west-1:<AWSACCOUNT>:activity:mediaService"
      
     }
    }
   },
   
   {
    "StartAt": "waitForCode",
    "States": {
     "waitForCode": {
      "Type": "Task",
      "Parameters": {
       "command": "waitForDTMF",
       "terminationKey" : "#",
       "dialogId.$": "$.dialogId",
       "destinationNumber.$": "$.destinationNumber"
      },
      "Resource": "arn:aws:states:eu-west-1:<AWSACCOUNT>:activity:mediaService",
      "Next": "hangupCall"
     },
     
     "hangupCall": {
      "Type": "Task",
      "Parameters": {
       "command": "hangupCall",
       "dialogId.$": "$.dialogId",
       "dtmfBuffer.$": "$.dtmfBuffer",
       "destinationNumber.$": "$.destinationNumber"
      },
      "Resource": "arn:aws:states:eu-west-1:<AWSACCOUNT>:activity:mediaService",
      "End" : true
     }
    }
    
   }
  ]
 }
 
 ,
 "validateCode": {
  "Type": "Choice",
  
  "Choices": [
   {
    "Variable": "$[1].dtmfParams.dtmfBuffer",
    "StringEquals": "",
    "Next": "noInput"
   },
   {
    "Variable": "$[1].dtmfParams.dtmfBuffer",
    "StringEquals": "1234",
    "Next": "ValidCode"
   }
   
  ],
  "Default": "InvalidCode"
 },
 "ValidCode": {
  "Type": "Succeed"
 },
 "noInput": {
  "InputPath" : "$[1]",
  "Type": "Wait",
  "Seconds": 10,
  "Next" : "placeCall"
  
 },
 "InvalidCode": {
  "Type": "Fail",
  "Cause": "Invalid code.",
  "Error": "ErrorA"
 }
 
}
}

The mediaService activity are exposed as task worker resources as seen in the “Resource” element of the Task type states:

"Type": "Task",
"Resource": "arn:aws:states:eu-west-1:<AWSACCOUNT>:activity:mediaService",

This activity resource takes a “command” parameter which indicates what media service function to be run for the task. For the alarm notification IVR example the following commands are used:

  • placeCall – instructs the media service to place a call to a phone number. The phone number is passed in the destinationNumber parameter. The task returns if the call was connected or not.
  • playPrompt – instructs the service to play out a message to the connected party. In this example the message is generated using Amazon Polly text to speech by providing the text to be spoken in the “say” parameter.
  • waitForDTMF –   tells the media service to listen for DTMF tone input and report back the received input. The “terminationKey” parameter tells that it should listen for input until a “#” has been entered.
  • hangupCall – is used to hang up the current call.

As seen the activity commands are by purpose very generic to be used as basic building blocks and the actual application logic is defined in the state machine.

Below is an overview of how the state machine logic is built up for the alarm notification workflow:

  • placeCall – Execution starts with the placeCall state which executes an activity task provided by the media services. This  make the platform to place a call out to the phone number provided in the destinationNumber parameter.
  • playAndGetDTMF – The next state is of the type “Parallell” which allow multiple task to be run simultaneous. We use this to start playout of the voice notification and at the same time start listening for the PIN code that confirms reception of the message. We want to run these at the same time since the house owner may enter the PIN code before the voice play-out has completed. Once a PIN code has been received the call is hung up.
  • validateCode – this is a state of type “Choice” where the entered PIN coded is compared against the desired. In this example the value to match against is hardcoded. In a real application this would be passed as a parameter. The result of this state has three outputs:
    • ValidCode – The PIN codes match
    • InvalidCode – The codes does not match
    • noInput – This state is called if the call ended without any DTMF input. This state waits for 10 seconds and then will invoke the first placeCall state an re-run the workflow again. It will keep repeating until a PIN has been received or until the timeout configured for the entire workflow has been reached.

Result and conclusion

Having added support for Step Function Activity workers to the media services layer and implemented the alarm notification workflow it was time to evaluate the user experience. Would the use of Step functions be enough responsive? How much delay would the on-demand generation of voice prompts using Amazon polly add?

You can see the outcome in the video showing the execution of the workflow.

The outcome of the tests was satisfactory. The concept of using Activity workers to trigger the media server functionality did not add significant latency. And the time it took to generate the voice prompts using Amazon Polly was also within bounds.

The graphical visualization on a running workflow in the AWS Console and the logs provided of previously run workflows was helpful during the development and testing of workflow logic and the Amazon State Language was intuitive to use to describe the IVR workflows.

In summary the idea of using AWS step functions to develop IVR workflows was verified to work in practice. This means that we now have one more AWS cloud native interface in addition to SNS, SQS and MQTT as alternatives to traditional REST interface in the iotcomms.io platform.