Porting Video Fingerprinting to Mobile: From Frame Extraction to Bloom‑Filter Retrieval
This article details how to migrate a video‑fingerprinting pipeline—covering video frame extraction, Hessian‑Affine + SIFT feature computation, JPEG and BLAS dependencies, multi‑threading, NEON acceleration, package‑size reductions, and a Bloom‑filter based retrieval system—onto iOS and Android devices while addressing practical pitfalls and performance trade‑offs.
1. Video Frame Extraction
Each video is sampled to obtain up to ten key frames. Pairwise Pearson correlation is used to discard frames that are too similar (correlation > 0.9). The remaining frames are treated as independent images for feature extraction.
2. Porting Image Feature Extraction to Mobile
2.1 Hessian‑Affine + SIFT source code
The VISE project provides the detect_points (Hessian‑Affine) and compute_descriptors (SIFT) implementations. Relevant files are:
src/external/KMCode_relja/exec/detect_points/detect_points.cpp src/external/KMCode_relja/exec/compute_descriptors/compute_descriptors.cppBoth functions accept a JPEG filename, load the image, and output feature regions and descriptors.
2.2 libjpeg‑turbo and BLAS dependency handling
VISE depends on the system jpeg library. For iOS/Android we replace it with libjpeg‑turbo compiled for the target platform (see https://github.com/libjpeg-turbo/libjpeg-turbo).
The original code uses the yael sub‑project, which links against a full BLAS library (OpenBLAS). Only the matrix‑multiply routine sgemm is required, so a lightweight custom implementation is provided to avoid pulling the entire BLAS stack.
2.3 Platform‑specific compatibility fixes
The function fvec_new in common/yael_v260_modif/yael/vector.c allocated memory differently on Linux, iOS and other platforms. The updated code uses: malloc on Linux and iOS memalign(16, sizeof(float)*n) on platforms that require 16‑byte alignment (e.g., ARM)
Proper error handling is added to abort on allocation failure.
3. Mobile Optimizations
3.1 Speed optimizations
IO reduction : Feature descriptors are kept in memory; intermediate .siftgeo files are eliminated.
Feature count control : The number of detected points per image is capped at 500 by adjusting the harris_lap function in src/external/KMCode_relja/descriptor/CornerDetector.cpp. This reduces processing time from ~50 s per frame to ~16 s on a typical smartphone.
NEON SIMD : Compilation flags enable ARM NEON instructions for vectorised arithmetic.
Multithreading : Up to eight threads process frames in parallel, each handling a single frame.
Algorithmic tweaks : Gaussian blur loops in gauss_iir.cpp are reordered to skip zero‑valued operations and exploit multiplication associativity, yielding a several‑fold speedup.
3.2 Package‑size reductions
Custom sgemm : A hand‑written sgemm function replaces the full OpenBLAS library (see code snippet below).
Header‑based parameters : Binary parameter files such as sift.pre_alpha.0.50.desc_covariance are converted into C header arrays, removing runtime file reads.
Code pruning : Only the necessary detection, description, and hash‑generation modules are retained; unrelated VISE source files are stripped.
void sgemm(char *transa, char *transb, const int M, const int N, const int K,
const float alpha, const float *a, const int lda,
const float *b, const int ldb,
const float beta, float *c, const int ldc) {
// Simple triple‑loop implementation with optional transposition handling
for (int i = 0; i < M; ++i) {
for (int j = 0; j < N; ++j) {
float sum = 0.0f;
for (int k = 0; k < K; ++k) {
float a_val = (*transa == 'T') ? a[k * lda + i] : a[i * lda + k];
float b_val = (*transb == 'T') ? b[j * ldb + k] : b[k * ldb + j];
sum += a_val * b_val;
}
c[i * ldc + j] = alpha * sum + beta * c[i * ldc + j];
}
}
}4. Bloom‑Filter Based Retrieval System
The video fingerprint database is a three‑dimensional list A. For each feature point we store a pair (hash_id, hash_value). The video ID is inserted into A[hash_id][hash_value]. During retrieval the same hash is used to fetch candidate video IDs and a TF‑IDF weighted score is accumulated for each candidate.
Evaluation on a library of 959 videos (283 queries) produced:
Recall: 0.978799
Precision: 0.975352
F‑score (2PR/(P+R)): 0.977
5. Pitfalls Encountered
PatchMask initialization : The global variable patch_mask in src/external/KMCode_relja/descriptor/Corner.cpp was accessed before initPatchMask ran, causing inconsistent fingerprints on the first run. Moving the call to initPatchMask earlier in the initialization sequence resolved the issue.
6. Conclusion
Deploying video fingerprinting on mobile devices requires balancing computational load and binary size. The presented engineering steps—frame selection, feature‑count limiting, SIMD, multithreading, custom BLAS, and header‑based parameters—significantly improve speed and reduce package size. Future work may explore OpenCL acceleration, removal of the SVD step, or fusion with audio features.
References
https://github.com/andrefaraujo/videosearch
https://github.com/ox-vgg/vise
https://github.com/libjpeg-turbo/libjpeg-turbo
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
