TV Service Providers are continually making decisions about programming, content purchases, advertising and more. Increasingly they depend upon data to guide these choices, to know what content viewers are watching, how they are watching (are they binge watching on the weekend or viewing in short spurts on their commute to work) in addition to many other issues.
Image source: istock
But managing the collected data and then feeding the analytics systems are significant challenges, and often require custom integrations. The service provider must start by defining the business decisions that data is required for – be they scheduling decisions, purchasing, advertising, or a combination of all three. Then they can decide upon the technology that will meet their requirements. Let’s take a look at the TV data itself and then at the technology required to manage it.
What type of data is collected?
Collected data is divided into two main categories ―structured and unstructured. Structured data is highly-organized and formatted, making it easily searchable in relational databases; examples include dates, addresses and listings of TV programs. Unstructured data has no pre-defined format or organization and includes items like text, images, video and audio.
Since data collection and processing is so complex, different types of databases are required for different activities. Relational databases consist of information that is organized into rows, columns and tables, and are indexed to speed up searches for information and updates and use the SQL language to interact with them.
For big data, NoSQL databases such as MongoDB and Cassandra provide the scalability and high availability that incorporate fault tolerance (for example, in cases where a partition fails these databases are able to continue operations). Sometimes a combination of both types of databases is required.
TV viewing information is not collected in well-organized reports, it is gathered as unstructured data, some of which is relevant and some of which is not pertinent. This data is almost impossible to process efficiently. For data to provide a valuable contribution it must first be converted into a structured format; then it can be analyzed and manipulated to provide answers to the questions that TV service providers are asking.
Typical data that is of interest to TV service providers can include topics such as the time spent viewing a specific program, whether it was watched live or recorded, and so on. To determine how long viewers watch a particular program, for example, the system needs to know which event began the viewing session, and which event was the last event that closed the session. However, to obtain that information, all events have to be collected, and then filtered so that only the relevant events are sent for processing. In addition, analytics must be carefully designed not to collect duplicate information, for example when a viewer watches parts of shows in overlapping time slots, the system must ensure that the same events are not captured twice.
What has brought this revolution in utilizing data?
Two factors are essential in enabling operators to utilize all this ‘big data’. One is the open source revolution, with the development of new technologies that can store and process big data. Some of these new frameworks that handle the processing of data include Spark (often used for realtime processing) and Hadoop (for processing extremely large amounts of data). However, in this nascent field, nothing is off-the-shelf.
Choosing the right open source technology requires research and often some experimentation to determine the correct framework for specific usages. This in turn can delay the implementation of this new technology.
Integrating the data with the reporting software also has its challenges, as again, no specific technology is available out of the box, but instead usually requires extensive integration. TV service providers are often unaware of the complexities required at each step in these complex processes, and may underestimate how long the implementation of these new capabilities will require.
The cloud is the other game-changer, providing agility, scalability, high availability, and almost unlimited storage capacity. Particularly for sports and other live events, not only are service providers challenged to provide excellent quality of service for massive audiences, these traffic spikes also generate large volumes of data. The cloud is able to absorb these spikes in the way that on-premises infrastructure rarely is. It is the cloud that enables the powerful processing and computation, in addition to scaling during peak periods.
What particular challenges exist in the TV industry?
With the growth of ‘any device, anywhere’, data is generated on various devices and in many locations. The living room is no longer the only place that viewers are watching, and subscribers who travel may choose to watch their OTT service from home instead of the local content.
Data and metadata collected from a mobile phone can be different than the information collected on an STB. The mobile device is built to provide the location, whereas an STB, especially a legacy model, may not have the ability to provide this information at all. If location metrics are important to the TV Service Provider, the TV platform will have to utilize other means of locating the STB on the network.
Mobile devices have much variation; in particular Android fragmentation results in a broad range of operating systems and types of devices that must be supported. While iOS is far more uniform, it is not as open as Android; therefore, when specific metrics are required this may involve some fine-tuning.
Other challenges include coping with sudden bursts in the number of viewers, such as for sports and other live events. These events generate large amounts of data, due to the large audiences that tune in simultaneously. The service provider needs to ensure that their systems are able to handle these spikes in the data collected. Here the cloud is essential, handling the spikes that on premise systems can rarely cope with.
How do TV service providers cope with the magnitude of data that is currently generated?
For TV service providers, determining the number of unique views for a given measurement is critical in obtaining relevant insights. Each counter, such as start/stop for a viewing, must be saved so that the next counter can be compared to it. Only once its processed can the non-essential data be discarded. Needless to say, the amount of counters that accumulate can rapidly grow, requiring additional storage capacity.
For operators whose subscribers are primarily using STBs, the challenges in collecting data are greater if the STBs are from legacy systems. Previous generations of STBs cannot always adapt to the new demands of smart data collection, and are only capable of sending all raw data forward without proper filtering. This quickly clogs up any processing and analysis systems, making them highly inefficient.
Later generation STBs are able to accept a smart, standalone agent that can collect data not only from STBs, but also from smart TVs, web browsers and other devices. The agent filters the stream so that only necessary data is passed on. This enables more efficient processing and lowers storage requirements.
Although most viewers are using recent versions of STBS, some companies are still using legacy STBs that may not even be connected to the Internet. This means that upgrading these devices cannot be upgraded to the newest software unless a technician does it on the spot. It also means that the data from that STB cannot be easily collected; instead alternate means of monitoring the network are required.
What about privacy?
With GDPR and similar initiatives, TV service providers have to ensure that personal data is aggregated and anonymized as required. When more granular data is required, the proper consent must be obtained.
How does proper data management help TV service providers?
Having accurate data about what audiences are watching is essential in content purchasing decisions. Determining which content is most popular guides decisions about when to schedule certain content, how to bundle it, how much should be paid to purchase it, and many additional factors. For targeted advertising this data is also critical, looking at drop off (to ensure that ads are played before drop-off) and other viewing habits.
Without proper data management, the accuracy of the analytics can be impaired, leading to inaccurate and even incorrect results (Garbage In/Garbage Out). Efficient data management enables faster processing and report generation and better use of storage resources.
What do TV service providers need to enable efficient data collection?
As we have discussed, collecting and managing the large amounts of data generated in TV systems is a complex task, and no ‘out of the box’ solution is available. However, consultation with experts such as VO’s professional services can shorten and optimize the process. TV service providers should utilize a smart Data Collection Agent (DCA) that sits on the connected devices. The DCA collects the data and filters it so it only forwards relevant statistics, enabling faster processing and more accurate analytics and reports. VO’s solutions are optimized for cost effectiveness, scalability and efficiency to offer data-driven TV applications such as business analytics and audience measurement that provide the smart insights that TV service providers require to run their businesses.
With special thanks to Ludo Rubin and Alexey Belyak for their assistance with this post.