WHITEPAPER

Neural Video Representation Compression

In this paper, we introduce two recent research studies on Neural Video Compression undertaken by the University of Bristol. They are part of BT’s Research and Network Strategy (RaNS) University programme, which supports research projects that are developed by BT alongside our academic partners.

These studies were subsidised through UK Research and Innovation’s (UKRI) Engineering and Physical Sciences Research Council’s (EPSRC) Industrial Cooperative Awards in Science and Engineering (ICASE) scheme. ICASE awards provide funding for doctoral studentships with first-rate, challenging research training experience, within the context of a mutually beneficial research collaboration between academic and partner organisations.

Neural video compression is an emerging field that leverages machine learning techniques – particularly neural networks – to compress video data more efficiently than traditional methods. This approach aims to reduce the size of video files while maintaining or even improving the quality of the video playback. The field of neural video compression is rapidly evolving, with ongoing research focused on improving the efficiency and effectiveness of these methods. Innovations such as hierarchical positional encoding, advanced entropy models, and optimised training pipelines are paving the way for more robust and scalable solutions.

Study one (2023) - Video Compression with Hierarchical Encoding-based Neural Representation (HiNeRV)

Learning-based video compression is a popular area of research because it offers the potential to rival traditional video codecs. In this field, Implicit Neural Representations (INRs) have been used to represent and compress both image and video content.

One of the advantages of INRs is their relatively high decoding speed compared to other methods.

However, existing INR-based methods have not yet achieved rate quality performance that matches the state-of-the-art in video compression. This shortfall is primarily due to the simplicity of the network architectures used, which limits their ability to represent complex video content effectively.

In this paper, we introduce HiNeRV – a novel INR that combines bilinear interpolation with new hierarchical positional encoding. This innovative structure uses depth-wise convolutional layers and multi-layer perceptron (MLP) layers to create a deep, wide network architecture with significantly higher capacity. This enhanced architecture allows HiNeRV to capture more detailed and complex video representations.

Additionally, we have developed a video codec based on HiNeRV – along with a refined pipeline for training, pruning, and quantisation. This pipeline is designed to better preserve HiNeRV’s performance during lossy model compression, ensuring that the quality of the compressed video remains high.

We evaluated the proposed method on two well-known video datasets – UVG and MCL-JCV – to assess its performance in video compression. The results demonstrated that HiNeRV significantly outperforms all existing INR baselines. Moreover, HiNeRV shows competitive performance when compared to other learning-based codecs. Specifically, HiNeRV achieved a 72.3% overall bit rate saving over HNeRV and a 43.4% saving over DCVC on the UVG dataset, as measured by Peak Signal-to-Noise Ratio (PSNR).

Download HiNeRV Whitepaper

Study two (2024) - Neural Video Representation Compression (NVRC)

Previous research using INRs has shown impressive results for video compression. These methods involve training a neural network to fit a video sequence closely, then compressing the network’s parameters to create a compact version of the video. Despite their promise, the best INR-based methods still don’t match the top conventional codecs in terms of performance.

This paper introduces a new INR-based video compression framework called Neural Video Representation Compression (NVRC). Unlike previous works focused on the network’s architecture, NVRC aims to compress the representation itself. It does this by encoding different network parameters using various entropy models and grouping these parameters into small sets. Each group is encoded with different quantisation and entropy model parameters, also compressed using lightweight entropy models. This approach optimises the overall model rate and reduces the overhead of quantisation and entropy model parameters.

NVRC also uses an improved end-to-end training process that iteratively optimises both the rate and distortion objectives, minimising the extra computation introduced by the entropy models. Experiments show that NVRC outperforms both conventional and learning-based methods in RGB and YVU420 colour spaces.

In summary, both HiNeRV and NVRC represent significant advancements in INR-based video compression, offering a more capable network architecture and an improved training pipeline. Together, they deliver superior compression performance

Download NVRC Whitepaper

What’s the connection?

While HiNeRV achieves significant coding gains over existing INR-based video coding methods, its model compression and quantisation techniques remain relatively simple – meaning they can be further improved. Additionally, HiNeRV’s training process is not fully end-to-end, which may lead to suboptimal performance. These limitations pave the way for the development of NVRC.

Next steps

Future research should prioritise reducing coding latency. Currently, for both HiNeRV and NVRC, it’s necessary to process the entire sequence before encoding, resulting in significant system delays unsuitable for real-time applications. Additionally, the encoding complexity of INR-based video codecs remains high due to content overfitting. It must be further reduced to meet the demands of practical applications.

Our latest insights

Show more insights

Creativity connected. Content assured.