Serverless IVR logic using AWS Step Functions

This article describes and tests the idea to use AWS Step Functions to create customer specific IVR logic. An IoT use case is used illustrate the customer need and the solution. Cost and architectural considerations are also covered.

Demo of IVR implemented using AWS Step Functions

IVR is an abbreviation of Interactive Voice Response which is the kind of system that automatically answer an incoming call and let you navigate through a set of choices before your call finally gets connected.

Typically an IVR flow is customer specific which means that if you run a multi-tenant/customer environment you must be able to define different navigation flows for different phone numbers and customers.

Modern application development based on microservices mandates an architecture with loosely coupled services that are independently deployable. Ideally these services are built upon serverless technology to ensure scalable and cost-efficient implementation.

AWS Step Functions is a service that enable coordination of execution of components in a distributed application. This is typically needed for a service that is composed of multiple microservices. It provides a serverless infrastructure and environment where workflows are defined end executed. An AWS Step Functions workflow is expressed as state machine diagram where tasks are executed in the different states. This as has a great fit with IVR services since an IVR flow is naturally described in a state machine.

It also seemed to be a architectural fit to use AWS Step Functions for IVR services. But would it provide sufficient response times to provide great user experience?

Another challenge when developing applications in a cloud environment is to understand the underlying cost dynamics. Would it be cost efficient?

The Use Case

The old traditional copper wired telephone networks are getting closed down as voice and Internet services evolve and get delivered via mobile or fiber networks. But there is a huge installed base of “Things” out there that have depended on the legacy copper network for communication.

For example, many residential alarm systems are connected to the telephone network. When the alarm is triggered it will dial a pre-programmed phone number and play a pre-recorded voice message. As the old fixed phone network is getting shut down this function stops working.

IOT Communication is working with a supplier of equipment that allow existing alarm systems to be retrofitted with mobile network connectivity. For many reasons it is more efficient to have this connectivity limited to data only and skip the voice service. That also allow other access technologies such as LoRaWAN or NB-IoT to be used for connectivity.

The old alarm system is connected to a device with mobile Internet access.  When the alarm is triggered the device use an API call back to the iotcomms.io platform. This API invokes an IVR service that place a call to the house owner which in turn acknowledges the alarm with a PIN code.

This allow the end user experience to be the same as before when an alarm is triggered even though there is no telephone connection anymore to the alarm system.

AWS Lambda vs. Step Functions

Another serverless alternative to AWS Step Functions would be to use AWS Lambda and implement the IVR flow as a separate function to be executed. This section compare these two options total cost and time to market perspective for development of IVR services.

Direct Cost comparison

Estimating costs when developing applications in the cloud is not obvious. Different services are priced and measured differently. Even though there may be multiple solutions to the same problem it is critical to pick the most beneficial solution from a total cost perspective. And in the total cost comparison it is not only the direct execution cost that matters, also time for development and operational overhead counts.

AWS Lambda let you run code in a serverless environment where you pay only for the execution time of the code rounded up to the nearest 100ms interval. The price per 100ms of execution time depends on how much memory your function is configured to use. There is also a per request cost to invoke the Lambda function. There is an upper execution time limit of 15 minutes for lambda functions.

AWS Step Functions is not priced upon execution time, it is charged per state change in the workflow. A Step Functions workflow have a maximum execution time of one year.

To save some zeroes in the cost calculations we are using µ$ (1/1000000 $). It is assumed that the lowest memory configuration of 128MB for Lambda is enough for the IVR workflow.  Below are the costs for Lambda and Step Functions:

Cost Lambda Step Functions
Per second execution 2.08 µ$  
Per invocation 0.2 µ$  
Per state change   25 µ$

Since the costs are not directly comparable, we must have some more parameters as input. The first parameter is the total duration of the IVR flow since this would have an impact on the Lambda cost. The second parameter is the complexity of the IVR dialog, the number of state changes.

The table below show the total direct cost comparison for a few different IVR scenarios:

Call Duration
(minutes)
State transitions
(number)
Lambda cost
(µ$)
Step functions cost
(µ$)
0.5 4 62,5 100
1 4 125 100
3 10 375 250
5 14 625 350

AWS Step function has a cost advantage when the total execution time is longer than one minute. The more idle time in the IVR flow the better is the case to use Step Functions.

A typical IVR flow with a few levels of interactions has an average call duration of 3 minutes and 10 state changes.  

So it seems like the use of step functions for IVR services is feasible. If the call duration is long in comparison to the number of stat changes it has a clear cost advantage.

What about the development and operational costs for the two alternatives?

Development and operational overhead

If you want to define the IVR logic as a Lambda function it requires developer skills in one of the programming languages supported by Lambda. It also requires custom code to log and track execution as well handling of timeouts, retry mechanisms and errors.

Step function workflows are defined in Amazon States Language and the AWS Console provides tools to generate and visualize the state machines. It also provides logs and visualization of state machine execution. One advantage using Step Functions is that the environment has built in support for error handling and timeouts in case a task is not executed properly, or an unexpected result is returned. If you implement the logic in in Lambda you must take care of this in code.

The latter is a major advantage for Step Functions opposed to Lambda. As said in the introduction the different IVR flown is likely to be customer and use case specific. Therefore the efficiency of development and troubleshooting the services are key. In fact the Amazon State Language is intuitive enough to let anyone mastering formulas in Excel to be able to develop and deploy production grade logic. This has a huge impact in time to market and offloading already busy developer teams by allowing customer facing staff to implement this logic.

Lambda or Step Functions?

Lambda is a great tool in the toolbox but both the cost comparison and the time to market assessments indicates that Step Functions is the winner.

It is only for very short-lived interactions where Lamba has a cost advantage. But in absolute dollars the total cost is very small and the cost overhead for 50 000 executions is in the order of the cost of an ice-cream. This is fairly large number of executions and also gives a hint that the time to market and operations dimension has a higher weight in the comparison.

Now we have enough justifications to go this path. But will it work in practice and provide a good user experience?

Architecture

Architecture overview

Even though there was a specific use case to be validated the architecture must be generic enough to allow multiple IVR flows to be deployed without modifications to the infrastructure.

Amazon API gateway is used to expose an IVRWorkflow API to trigger specific Step Functions workflows . The API takes the name of a workflow to be executed as a parameter as well the IVR flow specific parameters.

In the iotcomms.io platform there already existed media services functionality which allowed IVR flows to be defined in code. For some of the media services Amazon Polly is used to generate speech from text.

This media service layer was extended to implement AWS Step Functions Activity Workers. An Activity Worker is a remote task that is executed part of Step Functions workflow.

A new mediaService Step Functions Activity was implemented registered with AWS Step Functions to expose functionality provided by the media service. This acted as the interface between the media service logic and the Step Functions workflows. For scalability and reliability multiple Activity Workers are deployed.

Alarm Notification Step Functions Workflow

A Step Functions workflow was created to fulfil the required interaction flow for the alarm notification service. The workflow describes a state machine where each state executes a task  and the result of the execution is passed down to the next state for further execution until the workflow has completed.

Alarm notification Step Functions workflow

Above you see the visual representation of the IVR Step Functions workflow. The workflow was defined using the Amazon State Language:

{
 "Comment": "Alarm notification workflow.",
 "StartAt": "placeCall",
 "Version": "1.0",
 "TimeoutSeconds": 600,
 "States": {
  "placeCall": {
   "Type": "Task",
   "Resource": "arn:aws:states:eu-west-1:<AWSACCOUNT>:activity:mediaService",
   "Next": "playAndGetDTMF",
   "Parameters": {
    "command": "placeCall",
    
    "destinationNumber.$": "$.destinationNumber"
   }
   
   
  },
  "playAndGetDTMF": {
   "Type": "Parallel",
   "Next": "validateCode",
   "Branches": [{
    "StartAt": "playVoiceMessage",
    "States": {
     "playVoiceMessage": {
      "Type": "Task",
      "End": true,
      "Parameters": {
       "say": "The alarm has been started. Please confirm with your PIN code",
       "command": "playPrompt",
       "dialogId.$": "$.dialogId",
       "destinationNumber.$": "$.destinationNumber"
      },
      
      "Resource": "arn:aws:states:eu-west-1:<AWSACCOUNT>:activity:mediaService"
      
     }
    }
   },
   
   {
    "StartAt": "waitForCode",
    "States": {
     "waitForCode": {
      "Type": "Task",
      "Parameters": {
       "command": "waitForDTMF",
       "terminationKey" : "#",
       "dialogId.$": "$.dialogId",
       "destinationNumber.$": "$.destinationNumber"
      },
      "Resource": "arn:aws:states:eu-west-1:<AWSACCOUNT>:activity:mediaService",
      "Next": "hangupCall"
     },
     
     "hangupCall": {
      "Type": "Task",
      "Parameters": {
       "command": "hangupCall",
       "dialogId.$": "$.dialogId",
       "dtmfBuffer.$": "$.dtmfBuffer",
       "destinationNumber.$": "$.destinationNumber"
      },
      "Resource": "arn:aws:states:eu-west-1:<AWSACCOUNT>:activity:mediaService",
      "End" : true
     }
    }
    
   }
  ]
 }
 
 ,
 "validateCode": {
  "Type": "Choice",
  
  "Choices": [
   {
    "Variable": "$[1].dtmfParams.dtmfBuffer",
    "StringEquals": "",
    "Next": "noInput"
   },
   {
    "Variable": "$[1].dtmfParams.dtmfBuffer",
    "StringEquals": "1234",
    "Next": "ValidCode"
   }
   
  ],
  "Default": "InvalidCode"
 },
 "ValidCode": {
  "Type": "Succeed"
 },
 "noInput": {
  "InputPath" : "$[1]",
  "Type": "Wait",
  "Seconds": 10,
  "Next" : "placeCall"
  
 },
 "InvalidCode": {
  "Type": "Fail",
  "Cause": "Invalid code.",
  "Error": "ErrorA"
 }
 
}
}

The mediaService activity are exposed as task worker resources as seen in the “Resource” element of the Task type states:

"Type": "Task",
"Resource": "arn:aws:states:eu-west-1:<AWSACCOUNT>:activity:mediaService",

This activity resource takes a “command” parameter which indicates what media service function to be run for the task. For the alarm notification IVR example the following commands are used:

  • placeCall – instructs the media service to place a call to a phone number. The phone number is passed in the destinationNumber parameter. The task returns if the call was connected or not.
  • playPrompt – instructs the service to play out a message to the connected party. In this example the message is generated using Amazon Polly text to speech by providing the text to be spoken in the “say” parameter.
  • waitForDTMF –   tells the media service to listen for DTMF tone input and report back the received input. The “terminationKey” parameter tells that it should listen for input until a “#” has been entered.
  • hangupCall – is used to hang up the current call.

As seen the activity commands are by purpose very generic to be used as basic building blocks and the actual application logic is defined in the state machine.

Below is an overview of how the state machine logic is built up for the alarm notification workflow:

  • placeCall – Execution starts with the placeCall state which executes an activity task provided by the media services. This  make the platform to place a call out to the phone number provided in the destinationNumber parameter.
  • playAndGetDTMF – The next state is of the type “Parallell” which allow multiple task to be run simultaneous. We use this to start playout of the voice notification and at the same time start listening for the PIN code that confirms reception of the message. We want to run these at the same time since the house owner may enter the PIN code before the voice play-out has completed. Once a PIN code has been received the call is hung up.
  • validateCode – this is a state of type “Choice” where the entered PIN coded is compared against the desired. In this example the value to match against is hardcoded. In a real application this would be passed as a parameter. The result of this state has three outputs:
    • ValidCode – The PIN codes match
    • InvalidCode – The codes does not match
    • noInput – This state is called if the call ended without any DTMF input. This state waits for 10 seconds and then will invoke the first placeCall state an re-run the workflow again. It will keep repeating until a PIN has been received or until the timeout configured for the entire workflow has been reached.

Result and conclusion

Having added support for Step Function Activity workers to the media services layer and implemented the alarm notification workflow it was time to evaluate the user experience. Would the use of Step functions be enough responsive? How much delay would the on-demand generation of voice prompts using Amazon polly add?

You can see the outcome in the video showing the execution of the workflow.

The outcome of the tests was satisfactory. The concept of using Activity workers to trigger the media server functionality did not add significant latency. And the time it took to generate the voice prompts using Amazon Polly was also within bounds.

The graphical visualization on a running workflow in the AWS Console and the logs provided of previously run workflows was helpful during the development and testing of workflow logic and the Amazon State Language was intuitive to use to describe the IVR workflows.

In summary the idea of using AWS step functions to develop IVR workflows was verified to work in practice. This means that we now have one more AWS cloud native interface in addition to SNS, SQS and MQTT as alternatives to traditional REST interface in the iotcomms.io platform.

Another public cloud launch, AWS 5th site in Europe! Why is this so important to understand?

Amazon Web Services this week announced the opening of their 5th site in Europe. Some people tend to believe that it’s just a bunch of servers for virtualization and storage but that’s wrong. AWS is so very much more and a cloud platform with products, solutions, tools and computing power to help business scale and grow. Microsoft Azure and Google Cloud is other examples of public cloud platform fueling innovation.  Public cloud platforms have become the new operating systems and rapidly gaining acceptance.

Why is this so interesting? Well, over the last 20 years we have moved from “own your stuff” through Infrastructure as a service (IaaS), Software as a Service (SaaS), Platform as a Service (PaaS) and now Function as a Service (FaaS).

This is where the fun part starts.  As a software developer you can buy just the functions that you need, pay for usage, whenever needed, without having to worry about operations of servers and Infrastructure! This is fully in line with sharing economy thinking. From where do you get the functions? They are mass produced in public cloud and available globally. Majority of today’s platforms by operators, alarm companies, enterprises etc. are built upon a monolithic structure, meaning that a piece of SW running on specific HW is designed in such a way that features and functions are predetermined how to be used, operated and working together. It’s really one size fits all. If something is requested to be changed, it will affect all users on that SW platform.

Flexibility is about putting together functions in blocks that is required in each specific use case and pay for usage of only that. Gone is huge investments in your own switches, servers and other monolithic hardware. This is available in the cloud.

Are we up for a shift? Yes, big time! Think about operators and enterprises that have made huge investments over the years, that will become obsolete. Not only as a result of efficiency, but more probably due to that a monolithic structure of products and solutions will fall. This is also the prime reason why we will see large companies come into difficulties, and new innovative companies will take the leading role. Companies today with 2, 10 or 20 employees can challenge large companies and take a leading role with exponential growth.

Let’s take a step back and look what happened about 100 years ago. Henry Ford started large scale production of the automobile which came to represent the American dream. Cars fell in price and made it affordable to the public. The production with assembly line method’s made car prices drop from 850 USD to 250 USD over less than 20 years.

Making the parallel to AWS, Microsoft Azure and Google Cloud computing platforms is not far-fetched. IT and Telecom services will be mass produced, in their platform driving industry change. Think about what will happen with companies, stuck in their legacy not being able to change fast enough. New services will be mass produced at much lower cost and better customer experience. Be prepared!