November 10, 2020
Streaming Cloud Workflows and Monitoring
This is the third post in a series about the challenges of network monitoring in a telecommuting world.
The cloud is becoming a critical part of streaming working technology stacks, especially in the wake of the global pandemic. Yes, before the pandemic, streaming providers and broadcasters were already moving workflow components to the cloud such as encoding. But the local and national lockdowns shined the light on other components that needed to be freed from their physical constraints as well. With offices closed, it was impossible to utilize the traditional Network Operations Center (NOC) to monitor stream performance and availability. As streaming technologies have become virtualized both pre- and post-pandemic, a much more fundamental issue has been exposed: that the current approach to streaming workflow monitoring isn’t flexible enough to track both hardware and software components. For that, the very monitoring technologies need to become cloud-based as well.
Why Is Streaming Embracing the Cloud?
Streaming, as a technology stack, was born out of broadcast. It was an extension of the way video was delivered traditionally, either by transmission over terrestrial lines or through physical media. As such, early streaming relied heavily on capital expenses: encoders, servers, and other machines. But as companies began to provide virtualized versions of their technologies, like encoders, streaming providers could readily embrace the shift to operational expenses: paying for a server as it’s needed. The elasticity provided by the cloud also enabled better scalability and its redundant nature ensured reliability of the service for viewers. But there were also other benefits. Providers no longer had to worry about maintenance or updates. Everything was handled by the cloud provider and the technology company providing the virtualized component. And, cloud services enabled streaming providers to get to market much faster, perhaps accounting for the growth of OTT offerings over the past 24 months.
The Future of OTT is a Hybrid Approach
Just as streaming providers have embraced the cloud, so too have they recognized the need for on-premise equipment and third-party providers. Although everything can probably be virtualized, sometimes it makes sense to keep some elements of the streaming workflow behind the corporate firewall. For example, it might make more sense to have initial encoding for a live stream happen on physical machines that can be more closely managed and aren’t subject to potential latency of cloud resources. The workflow might look something like a rack of encoders producing a master stream and then sending it to cloud-based encoding resources for transcoding and repackaging. Then, from the cloud, the finalized streams could be moved to a content delivery network, either part of the same cloud as the encoders or an entirely different network.
A New Approach to Monitoring Is Needed
That hybrid strategy, though, requires a different strategy for monitoring. Before the cloud was used for anything but distribution, almost all of the equipment for streaming was on-premise. Because of this, monitoring could make use of hardware probes, also installed in the same network. And although these probes provided solid data for streaming operators to examine stream performance, they were ultimately limited in three ways. First, they were hardware so they required maintenance and software updates. Second, they couldn’t monitor any resources outside the network which was problematic as streaming workflow technologies were being virtualized. Third, they couldn’t monitor third-party resources like content delivery networks which were becoming increasingly important to deliver a great streaming experience.
This new approach to monitoring is based on two core pillars: data acquisition and data visualization. In that first pillar, it is critical for streaming providers to employ technology that captures data from on-premise equipment, virtualized technology in the cloud, and third-party providers. But that data is useless if it can’t be correlated together. And that can’t happen if the monitoring of those three areas is handled by different monitoring technologies which provide no means of integration. That’s why data visualization is the second pillar. Just as the approach to acquiring the data must change to accommodate these different groups of technologies within the hybrid streaming workflow, a data visualization strategy must address how to connect the data together into a single view. This can really only be done through a monitoring approach built on API data consumption.
Moving Monitoring to the Cloud
The natural place for this new approach to monitoring is the cloud. Rather than keeping monitoring behind the corporate firewall where, if it’s built on API data consumption, it can become a security risk as well as create accessibility issues in the event people are out of the physical office. But a monitoring solution in the cloud, a dashboard on top of a data repository that can easily connect to any API, allows for anyone to access critical operational data about any component within the streaming workflow. And it’s inherently flexible. Rather than relying on hardware-based probes, monitoring in the cloud (as part of a MaaS solution) can employ software agents that can be easily integrated with virtualized workflow components. This makes the entire monitoring architecture extremely flexible for data acquisition, especially with components and third-parties that support that API integration. Because the agents are software and cloud based (encapsulated as a microservice, for example), they are extremely scalable. If your workflow capacity expands at some point, say needing more caches, the agents can expand as well to ensure that there is no single point of failure for acquiring all the data (this, of course, can’t happen with physical hardware probes which may tip over when flooded with data). What’s nice about this approach, though, is that these same microservice agent containers can be deployed internally as well. Yes, you’ll require some hardware behind the firewall to install them (a pair of redundant NGINX servers, for example) but once deployed, they could easily collect data from on-premise equipment that supports API integration and relay it back to the cloud monitoring platform.
With all the data going back to a cloud-based data lake, it’s easy to connect a cloud-based visualization tool (like Datadog or Tableau, for example) that allows for significant customization. You can build whatever dashboard you want, normalize the data in a way which best fits your operations and business needs, and provide access to anyone. That last benefit is key because people are no longer tied to being in the building to see the data: operations people can be anywhere, using any device, to do root-cause analysis.
As you can probably garner by now, a cloud-based monitoring approach, using containerized, microservice agents deployable anywhere with API integration, is highly flexible and scalable. Rather than being tied to a specific platform or operating system, the agents can be deployed on any environment that supports containers which allows for inherent elasticity as well. Agents don’t tip over because they are overwhelmed. Business logic attached to the container simply spins up a new microservice on a different thread to handle the new demand. This kind of flexibility is critical in an environment such as streaming where traffic can fluctuate wildly.
It’s important, though, this kind of monitoring approach, employing agents across cloud, on-premise, and third-party, take into account best practices. A lot of work has already been done in organizations, like the Streaming Video Alliance, to identify and document the considerations such an “end-to-end” monitoring solution that must be addressed.
If you want to find out more on how Touchstream performs these tasks watch out latest video.
Planning is Critical
Monitoring can be an open platform. It doesn’t have to exist behind the corporate firewall, tied to screens hanging on a NOC wall. It can leverage the cloud, employ distributed microservice agents to integrate with data sources through the workflow, and provide visualization consumable on any device, anywhere. But to make this happen, planning must be a priority. You need to think about which of your workflow technologies can be virtualized, which can be fulfilled by third-parties, and how you will get the data from each of those. Once you have that list, you can then assess it against your current approach to monitoring and plan for a clear transition to a cloud-based strategy. In addition, it’s important to select a visualization tool that will support your long-term needs. Can it normalize the data in the way you need? Is it customizable enough to meet your operations and business needs? Does it integrate with the datasets you want to use (we use Datadog at Touchstream because they have plugins for CDNs and Elemental encoders). Doing this kind of planning will ensure continuity: even as you expand your monitoring efforts, you can continue to see the data that’s important.
It’s important to remember too, that the ultimate objective, when planning and implementing a cloud-based monitoring approach is being able to identify the root-cause for streaming issues. With agents deployed across your cloud and on-premise infrastructure, gathering data from all workflow components, it’s much easier to view granular details even down to the individual video session. And with a customized dashboard, KPI thresholds that have been exceeded can be seen from, and acted upon, via a smartphone.
The cloud is becoming an intrinsic part of the streaming technology stack. There is no reason that it shouldn’t become integral to monitoring as well.
It’s Time To Embrace Remote Monitoring. Are You Ready?
In the first blog post of this series, we talked about how monitoring streaming workflows, in a post pandemic world, is changing. Operations engineers and others need to be able to move about freely, not tied to a physical room and screens on the wall. But to enable that, we posited in the second blog post, requires a strategy. Just throwing everyone out of the NOC and expecting them to be efficient and effective, when they are used to dozens of data sources available to them at a glance, isn’t tenable. Finally, this blog post explored the role of the cloud in that remote monitoring solution.
It’s clear that times have changed with respect to how we can monitor streaming workflow components. Not only can we get data about both physical and virtualized elements, but, by leveraging the cloud, we can even get data from third-party providers such as CDNs. And if you don’t want to build it yourself, Touchstream has a solution called VirtualNOC that can get you quickly get your network operations engineers the data they need to take quick action against problems that might cause you subscriber loss. With powerful features like a data rewind, allowing you to trace a session all the way back through the workflow to identify where it failed, you’ll never be in more control of your streaming monitoring.