The Teknomo–Fernandez algorithm (TF algorithm), is an efficient algorithm for generating the background image of a given video sequence.
The TF algorithm produces the background image from a video of a street with many pedestrians crossing.
By assuming that the background image is shown in the majority of the video, the algorithm is able to generate a good background image of a video in O ( R ) {\displaystyle O(R)} {\displaystyle O(R)}-time using only a small number of binary operations and Boolean Bit operations, which require a small amount of memory and has built-in operators found in many programming languages such as C, C++, and Java.[1][2][3]
History
The TF algorithm generates the colored background image and uses it for background subtraction.
People tracking from videos usually involves some form of background subtraction to segment foreground from background. Once foreground images are extracted, then desired algorithms (such as those for motion tracking, object tracking, and facial recognition) may be executed using these images.[1][3]
However, background subtraction requires that the background image is already available and unfortunately, this is not always the case. Traditionally, the background image is searched for manually or automatically from the video images when there are no objects. More recently, automatic background generation through object detection, medial filtering, medoid filtering, approximated median filtering, linear predictive filter, non-parametric model, Kalman filter, and adaptive smoothening have been suggested; however, most of these methods have high computational complexity and are resource-intensive.[1][4]
The Teknomo–Fernandez algorithm is also an automatic background generation algorithm. Its advantage, however, is its computational speed of only O ( R ) {\displaystyle O(R)} {\displaystyle O(R)}-time, depending on the resolution R {\displaystyle R} R of an image and its accuracy gained within a manageable number of frames. Only at least three frames from a video is needed to produce the background image assuming that for every pixel position, the background occurs in the majority of the videos. Furthermore, it can be performed for both grayscale and colored videos.[1]
Assumptions
The camera is stationary.
The light of the environment changes only slowly relative to the motions of the people in the scene.
The number of people does not occupy the scene for the most of the time at the same place.
Generally, however, the algorithm will certainly work whenever the following single important assumption holds:
For each pixel position, the majority of the pixel values in the entire video contain the pixel value of the actual background image (at that position).[1]
As long as each part of the background is shown in the majority of the video, the entire background image needs not to appear in any of its frames. The algorithm is expected to work accurately.[1]
Background image generation
Equations
For three frames of image sequence \( x_{1}, x_{2} \), and \( x_{3} \), the background image } B is obtained using
\( {\displaystyle B=x_{3}(x_{1}\oplus x_{2})+x_{1}x_{2}} \)[1]
The Boolean mode function S of the table occurs when the number of 1 entries is larger than half of the number of images such that[1]
\( {\displaystyle S={\begin{cases}1,&{\text{if }}\sum _{i=1}^{n}x_{i}\geq \left\lceil {\frac {n}{2}}+1\right\rceil ,{\text{ and }}n\geq 3\\0,&{\text{otherwise}}\end{cases}}} \)
For three images, the background image B can be taken as the value
\( {\displaystyle {\bar {x}}_{1}x_{2}x_{3}+x_{1}{\bar {x}}_{2}x_{3}+x_{1}x_{2}{\bar {x}}_{3}+x_{1}x_{2}x_{3}} \) [1]
Background generation algorithm
At the first level, three frames are selected at random from the image sequence to produce a background image by combining them using the first equation. This yields a better background image at the second level. The procedure is repeated until desired level L {\displaystyle L} L.[1]
Theoretical accuracy
At level \( \ell \) , the probability \( p_\ell \) that the modal bit predicted is the actual modal bit is represented by the equation \( {\displaystyle p_{\ell }=(p_{\ell -1})^{3}+3(p_{\ell -1})^{2}(1-p_{\ell -1})} \). The table below gives the computed probability values across several levels using some specific initial probabilities. It can be observed that even if the modal bit at the considered position is at a low 60% of the frames, the probability of accurate modal bit determination is already more than 99% at 6 levels.[1]
Computed probabilities table
This table gives the computed probability values across several levels using some specific initial probabilities. It can be observed that even if the modal bit at the considered position is at a low 60% of the frames, the probability of accurate modal bit determination is already more than 99% at six levels.
Space complexity
The space requirement of the Teknomo–Fernandez algorithm is given by the function \( {\displaystyle O(RF+R3^{L})}\) , depending on the resolution R of the image, the number F of frames in the video, and the desired number L of levels. However, the fact that L will probably not exceed 6 reduces the space complexity to \( {\displaystyle O(RF)} \).[1]
Time complexity
The entire algorithm runs in \( {\displaystyle O(R)} \)-time, only depending on the resolution of the image. Computing the modal bit for each bit can be done in O(1)-time while the computation of the resulting image from the three given images can be done in \( {\displaystyle O(R)} \)-time. The number of the images to be processed in L levels is \( {\displaystyle O(3^{L})} \). However, since \( {\displaystyle L\leq 6} \), then this is actually O(1), thus the algorithm runs in \( {\displaystyle O(R)} \).[1]
Variants
A variant of the Teknomo–Fernandez algorithm that incorporates the Monte-Carlo method named CRF has been developed. Two different configurations of CRF were implemented: CRF9,2 and CRF81,1. Experiments on some colored video sequences showed that the CRF configurations outperform the TF algorithm in terms of accuracy. However, the TF algorithm remains more efficient in terms of processing time.[5]
Applications
Object detection
Face detection
Face recognition
Pedestrian detection
Video surveillance
Motion capture
Human-computer interaction
Content-based video coding
Traffic monitoring
Real-time gesture recognition
References
Teknomo, Kardi; Fernandez, Proceso (2015). "Background Image Generation Using Boolean Operations". arXiv:1510.00889 [cs.CV].
Abu, Patricia Angela; Fernandez, Proceso. "Performance Comparison of the Teknomo-Fernandez Algorithm on the RGB and HSV Colour Spaces".
Abu, Patricia Angela (March 2015). Improving the Teknomo–Fernandez Background Image Modeling Algorithm for Foreground Segmentation (Ph.D). Ateneo de Manila University.
Abu, Patricia Angela; Fernandez, Proceso (March 2016). Modifying the Teknomo–Fernandez Algorithm for Accurate Real-Time Background Subtraction. Philippine Computing Science Congress.
Abu, Patricia Angela; Chu, Varian Sherwin; Fernandez, Proceso. "A Monte-Carlo-based Algorithm for Background Generation".
Further reading
Chu, Varian Sherwin B. (2013). Background image reconstruction using random frame sampling and logical bit operations (Thesis). Ateneo de Manila University.
Abu, Patricia Angela R. (2015). Improving the Teknomo-Fernandez Background Image Modeling Algorithm for Foreground Segmentation (Thesis). Ateneo de Manila University.
Undergraduate Texts in Mathematics
Graduate Studies in Mathematics
Hellenica World - Scientific Library
Retrieved from "http://en.wikipedia.org/"
All text is available under the terms of the GNU Free Documentation License