Introduction
The quality, latency and smoothness of live view streaming is dictated by many factors:
The available streams from the camera.
The connection speed both on site and on the viewing device.
The performance of the viewing device.
The underlying streaming technology that is being used.
Other manufacturer specific trickery/tweaking that may or may not be desirable.
Some of these are self-explanatory but some are a matter of trade-offs and priorities. This document explains the constraints and the decisions taken by TetherX when it comes to live view streaming.
Available Streams from Cameras
Typically cameras will provide 2 video streams as well as the ability to request a single image. These limitations dictate what is and isn’t possible when it comes to both recording and streaming.
💡 Note: More expensive and speciality cameras will provide additional video streams and custom streams of crops/de-warps of the same image (e.g. 180 or 360 cameras).
The streams are usually accessed through RTSP and the images through HTTP. For example:
Primary Stream: 6 megapixels, 6mbit, H.265, 25fps RTSP stream
Secondary stream: D1, 1mbit, H.264, 10fps RTSP stream
Snapshot URL: Full resolution, 1 image over HTTP (refreshed 1 time per second)
The above is just an example as this ranges wildly depending on the hardware and customer requirements.
Formats
H.265 - This is arguably the best currently available video format for most video applications, including video surveillance. The resulting space consumption is around 50% of H.264 for equivalent video quality (though your results may vary depending on the quality of the H.265 encoder chip and settings in your camera and/or recorder).
Please note this format is not compatible with older devices and operating systems. At present (Q3 2023) support is limited to:Windows (version 10 1709 or newer) and only when HEVC video extensions from the Microsoft Store are installed for devices with hardware support
Windows (version 8 or newer), macOS ( Big Sur 11.0 or newer), Android, Linux (Chrome 108.0.5354.0 or newer) and Chrome OS platforms with hardware support
H.264 - This is the only video format widely compatible across all devices. Most notably, older Windows and Android devices that cannot play H.265 in some cases. Please refer to https://caniuse.com/hevc
VP9 (and older) - This is Google’s attempt at an open source H.265 alternative, while a great format on paper, it is not embraced by the industry and therefore does not see wide hardware support. Because of this, it tends to use significantly more resources to both record and display and is unfortunately not recommended when dealing with many simultaneous streams.
MPEG4 - This is a legacy video format and is not generally compatible on modern devices and will need to be transcoded, leading to additional hardware costs, you will not see it on more recent cameras / recorders.
MJPEG - Usually 5 to 25x larger in size for the same number of frames. While prominent on older cameras and recorders, anything since around 2008 has switched to video formats.
Transcoding vs Proxying
TetherX supports both transcoding and proxying/packaging. The differences between the two are as follows:
Transcoding is a process of taking the stream from a camera, decoding it and then re-encoding it into a different format. This is the ideal method to hit a specific requirement of Internet bandwidth and viewing size, but it is in most cases cost prohibitive as you need power hungry and expensive hardware to achieve.
For context, a TetherBox Pico capable of recording 10 cameras will only be able to record 2 if transcoding (you can enable transcoding in TetherX by editing a camera and switching to the advanced tab).
Proxying & Packaging is what is used in most cases, where the original video stream is maintained but is packaged into different containers, for example .mp4 for video recording and .ts for local and high-def live streaming (see streaming technologies below).
This means that ultimately, you are at the mercy of the camera/recorder. If you need to open 16 cameras and each one is 1mbit, but your upload speed is only 10mbit, it will simply not work because the stream size cannot be adjusted without either:
Changing your camera settings (what budget equipment typically does) OR
Transcoding, which is very costly.
Video vs Images
TetherX use a combination of images and video to achieve different goals. It is important to understand the differences and advantages/disadvantages of each, which in turn dictate when either is used in TetherX.
The main difference between image and video compression is every image is self contained, where every video frame is part of a stream. While you may skip an image, skipping a frame in a video stream leads to parts of the image breaking / disappearing:
Systems that ignore broken/skipped data (e.g. streaming over UDP) will show a broken image while many systems will simply show a loading spinner while they wait for the next key frame (which is controlled by the key-frame interval set on the camera/recorder, typically set to 1 second, or the same number as fps, but sometimes much longer).
Because each image is self contained, it is typically 5 to 25x larger in bitrate vs video at 25 images per second. It may initially seem obvious that video should be used when accessing remotely, however there are certain important underlying trade-offs that make it a much tougher question which needs to be evaluated on a case by case basis, according to the following:
Advantages of Images
Initial image loads 5-10x quicker than video (you need to load 1 frame, not 25 or potentially hundreds with a larger key-interval setting on the camera/recorder).
Handles bandwidth fluctuations seamlessly (images can be skipped which is not possible with video).
Latency is is lower (you load 1 frame, not 25 before showing the image, more suitable for PTZ / two way audio - see streaming technologies below).
Compatible with all cameras and viewing devices.
Low resource utilisation on both viewing and recording device, means you can view 100+ cameras simultaneously.
Image transcoding/transizing is an efficient method to reduce bitrate when viewing more than 16 simultaneous cameras.
The recording quality is not impacted when viewing live (some manufacturers reduce recording quality to accommodate live view).
Disadvantages of Images
Much higher bandwidth (5x to 25x) to achieve the same fps in live view (trade-off of quality vs smoothness).
This leads to a much lower FPS in practice when bandwidth is kept the same.
Advantages of Video
Much lower bandwidth (5x to 25x) for the same frame-rate/smoothness.
This usually leads to a much smoother video at the same bandwidth.
Disadvantages of Video
Much higher initial loading times (5-10x longer).
The usually higher latencies mak it more challenging for applications such as PTZ or two way audio (note: some cameras use low latency encoders to alleviate some of this but it is a hit and miss and leads to compatibility issues).
Will simply not load if the connection cannot sustain all requires streams.
This usually restricts how many simultaneous cameras can be viewed as it is impossible to gracefully adapt the bitrate of each stream on the fly.
Lower quality than images.
Trickery is used by manufacturers to allow live view by reducing recording quality which leads to potentially losing critical video evidence.
Live View Streaming Technologies
TetherX uses all mainstream streaming technologies available and choses between them depending on the specific requirements and constraints:
MJPEG - The technology is simple and what TetherX falls back to when other methods are not practical (e.g. resource constraints). The viewing device requests an image, once that image is downloaded, the previous image is replaced and the next image is requested. The image is always complete and in full quality and the speed is determined by the available resources.
If the device in question is limited by its performance or bandwidth, it loads the next image slower but still remains functional. This leads to a reliable live view that automatically adapts to the resources available and can sustain a virtually unlimited number of cameras viewed simultaneously. This however by design does not provide high frame-rate which viewers associate with smoothness. It also is not efficient when viewing a smaller number of cameras at higher frame-rates.
HLS - This is the technology used by platforms like Netflix, Amazon Prime, etc. This technology is pioneered by Apple and works by the browser requesting a playlist, which provides the next available video file. Typically, the stream is broken up into 1 or 2 second video files (this duration is set by the key interval setting on the camera or recorder and has a significant impact on latency).
Unless transcoding, these files simply re-package the same video stream coming from the camera and therefore, if the connection speed cannot sustain all the requested streams on screen, the video will take a long time to start and will stop after the initial second, showing a loading spinner until the next video file is fully downloaded. Effectively, sufficient bandwidth is crucial.
Even with sufficient bandwidth, the typical latency can be anywhere from 2 to 5 seconds which leads to a poor experience in applications such as PTZ or Two Way Audio.
LHLS - This is an extension on the above technology that keeps the connection open instead of re-requesting the playlist each time, it can reduce latency by up to 50% but can be less reliable in some situations.
RTSP/RTP/RTMP - These are more traditional streaming technologies that are not readily compatible with most devices and require special flash players, plugins or custom software. This can provide lower latencies than LHLS but requires additional configuration and software. TetherX uses this internally to get the streams from the cameras/recorders and convert them into more compatible streaming technologies.
WebRTC - This (or similar) is the technology used for video conferencing like Zoom, Teams, Skype, etc. It typically requires transcoding the video to something like VP8/9 with special low latency encoder parameters. While yielding the best results, it is cost prohibitive in most cases and is limited to a small number of simultaneous streams (see transcoding above). This has been tested by TetherX but it is presently not used in production.
How TetherX Does It
TetherX supports most streaming technologies and switches between them depending on context.
When local streaming (the TetherBox is able to detect if you are on the same network), if we are viewing less than 10 cameras simultaneously TetherX switches to LHLS (see above) to show the primary video stream from each camera at full resolution/frame-rate (assuming your viewing device is capable of displaying this many streams). If this is not working well for you, you are presented with a toggle to switch off High-def streaming to fall back to loading images.
When full screening a camera (double tap/click a camera), we assume you want the higher quality and we switch to the full primary stream. If your connection is not able to sustain this stream, we automatically fall back to loading images.
When viewing multiple sites / cameras over Internet in record mode, we switch to image viewing but we transcode/transize the images depending on the size of each live view box on your screen. This ensures that you are able to view many simultaneous cameras, the latency is lower than video, it automatically adapts to the available bandwidth and it does not impact recording quality. This does however mean a lower fps.
When viewing multiple sites / cameras over Internet in monitor mode, we proxy the connection directly to the image provided by the camera. This leads to a further lowering of fps but ensures the TetherBox can monitor the health of and provide live view of 10x more cameras than in record mode.
Learn more about using Live View in TetherX here.
How Hikvision iVMS Does it
Because of the constrains outlined above, in order to show smooth video in iVMS (and Dahua PSS) software, if the connection speeds both on site and viewing device cannot accommodate the streams, the software temporarily reconfigures the cameras to send a lower bitrate stream while keeping the resolution the same.
💡 Note: if you happen to kill the software at this point (or lose connection), this can cause the camera to permanently record in lower quality.
Because many cameras or recorders only support a single stream at above D1 resolution, this means that not only is the live view quality reduced, but also the video being recoded which can lead to losing critical video evidence.
In our testing, limiting the connection to 5mbit 200ms upload and streaming just 4 cameras in HD resulted in blocky and usable footage being recorded on the NVR. It also took over 10 seconds to get an initial image, however the motion was smooth (25fps) once loaded.
In comparison when using images (outside of the iVMS/PSS software), the images loaded almost immediately, the image quality was significantly higher, but the frame-rate was poor (3-4fps) - still sufficient for making decisions in a control room setting, but understandably disappointing in smoothness by comparison. (Note: higher fps can still be achieved by full screening a single camera on demand).
When viewing a smaller number of cameras from the same manufacturer on a single PC or phone, it looks impressive to have smooth 25fps video on iVMS/PSS/others. However, when dealing with 100+ or even 32 cameras, from different manufacturers, different firmware versions, with multiple users, across multiple sites, with differing and fluctuating bandwidths, you begin to appreciate the trade-offs and advantages/disadvantages of each approach.
💡 Note: This approach of custom code to adjust Hikvision camera settings to achieve smoother live view is only suitable if you do not care about recording quality and you are optimising for a single manufacturer with compatible models/firmware.
TetherX is an open platform with a focus on convention over configuration, we prioritise the integrity of recorded footage above live view and we implement approaches that work across all recorders/cameras as well as viewing devices.