AI-based soccer metrics extraction app

Written by

Izan Leal & Aleix Figueres

September 17, 2025

Smarter sports analysis through AI
Key features
Architecture
Conclusion

Smarter sports analysis through AI

Fluendo has developed a cutting-edge AI application that brings tactical soccer analysis into the modern age, with no wearable sensors, manual tagging, just video input and deep learning.

This project is the result of a close collaboration with a leading company in the sports industry, such as Longomatch. It combines Fluendo’s expertise in real-time multimedia and AI infrastructure with deep domain knowledge in professional sports analytics.

This system enables automatic extraction of quantitative metrics like player position, team identity, and movement paths directly from broadcast footage. It’s optimized for use in a sports video analysis app, and powered by Raven AI engine and GStreamer backend for real-time, hardware-accelerated performance.

Key features

The app is designed to tackle real-world constraints of analyzing professional sports video:

Scene calibration: Automatic or manual mapping of 2D video frames to a 3D field model using homography estimation.
Player detection and tracking: Real-time detection and multi-object tracking, robust to occlusions.
Team identification: AI-based color clustering system to identify player teams from broadcast footage.
Trajectory smoothing: Generates polished, continuous movement paths from raw detections.
Integrated video editor: Export annotated match segments and overlay animations directly in LongoMatch.

Architecture

Overview

This project aimed to design and implement a set of AI components capable of executing each algorithm independently. These components are developed as standalone elements and can be encapsulated as GStreamer plugins or native C# NuGet packages. This modularity enables seamless integration into a centralized task management infrastructure within the final application framework, allowing each module to be orchestrated and reused across various use cases with maximum flexibility.

This marked the birth of Fluendo’s Raven AI Engine, our unified platform built to handle AI execution and rendering capabilities within a single, multi-device, and multiplatform engine. Raven provides the foundation for efficient, real-time AI processing and visualization and is designed to integrate natively with GStreamer for high-performance media workflows.

As a result, the system architecture follows a modular design, comprising multiple AI pipelines that process video in real time:

Capture: Video frames are read and fed into a GStreamer-based processing pipeline.
Inference: Deep learning models detect players, extract keypoints, and calculate 3D metrics.
Integration: Plugins allow tight integration with LongoMatch and hardware-accelerated editing workflows.
Export: Final output, including metrics and video overlays, can be exported for analysis and sharing.

Figure 1: Stack of the AI-powered sports analytics system

Visual-to-metric conversion

This section details all the different AI algorithms, some based on Deep Learning and others on classical computer vision, used to extract the final sports performance indicators based on pure visual information (pixels).

Scene calibration

To understand player positions in real-world coordinates (meters), we estimate the homography between video frames and a canonical soccer field model. This transformation is essential for extracting spatial metrics such as distance covered, heatmaps, and tactical formations.

We developed two complementary calibration systems:

Manual calibration tool: Designed as a contingency method or for rare edge cases, this tool allows users to manually select at least four reference points on the field in a video frame and map them to their corresponding locations on a top-down model of the pitch. Once the points are selected, the system calculates a homography matrix that can be used to reproject player positions into a consistent metric space.
Automatic calibration system: Our primary solution uses a deep learning pipeline based on an autoencoder architecture. The model detects key landmarks (e.g., penalty areas, center circle, corner arcs) in a single video frame and uses them to compute the field’s spatial layout. This fully automated approach removes the need for human input and significantly accelerates the video analysis workflow.

To address the scarcity of annotated calibration datasets in sports, we developed a synthetic data generation pipeline using Blender. We simulated over 6,000 high-resolution images across four different stadium models, varying:

Camera position and zoom
Lighting conditions
Pitch textures
Presence of lines and visual obstructions

After training, we obtained sufficient accuracy to support advanced spatial metrics like pass networks, zone occupancy, and tactical shape analysis in broadcast or training video.

In future iterations, we plan to integrate a confidence-based fallback mechanism. If automatic calibration fails or confidence is low, the system will default to semi-automatic or manual input, maintaining reliability in challenging scenarios.

Figure 2: Homography-based automated calibration pipeline

Object detection and segmentation

At the heart of the AI pipeline is a fine-tuned, lightweight object segmentation model explicitly optimized for soccer scenarios. Its primary role is to identify and isolate the core entities present in the game environment:

Players
Referees
The ball

These elements are key for all subsequent stages, such as tracking, identification, metric calculation, and overlay generation.

To further boost robustness, our pipeline uses instance segmentation, which allows us to detect bounding boxes and compute pixel-level masks for each object. These masks are essential for precise color extraction in team classification and for overlaying animations cleanly during video rendering.

The model’s outputs are structured as custom GStreamer metadata, making them natively compatible with the downstream tracking and team clustering stages. This enables efficient memory sharing and avoids additional post-processing overhead, even in high frame rate scenarios.

By tightly coupling inference with GStreamer and leveraging GPU acceleration end-to-end, we ensure that object detection and segmentation scale effectively for both offline processing and live game analysis environments.

Figure 3: Detection and segmentation example

Tracking and re-identification

We implemented a robust, hybrid tracking system based on a state-of-the-art multi-object tracking algorithm to maintain consistent player identities across frames and recover trajectories even under occlusion or camera transitions.

This solution combines:

Kalman filtering
Appearance embeddings
Track assignment optimization

Re-identification plays a key role in handling occlusions. By storing appearance descriptors for each tracked player, the system can reassign identities even after visual disappearance, such as:

Players crossing paths
Quick camera pans or zooms
Referees temporarily obstructing the view

Evaluation of the SoccerNet dataset (~7,000 images) showed high scores across Multiple Object Tracking Accuracy (MOTA), ID precision and recall (IDF1), and higher-order tracking Accuracy (HOTA) metrics.

The tracking system has also been integrated as a GStreamer plugin with native support for zero-copy video buffers, ensuring minimal latency and high scalability.

In live or post-game workflows, the output can be streamed, visualized, or exported as metadata for overlay rendering and statistical breakdowns.

Figure 4: Player tracking with occlusion re-ID

⬆ Back

Team classification

Automatically distinguishing between teams in soccer footage is far from straightforward. Players may wear complex patterns, referees and goalkeepers often wear unique kits, and lighting or compression artifacts can distort color data.

To address these challenges, we developed a proprietary Multi-Level Object Clustering algorithm tailored to the nuances of broadcast sports footage.

Our approach has proven robust across:

Striped or multi-colored kits (e.g., red-white stripes, blue-yellow gradients)
Goalkeepers wearing distinct kits
Daylight and floodlit matches
Low-resolution or compressed video streams

The clustering results are cross-validated with tracking data, ensuring that players consistently retain their team label across time and camera cuts. This also helps correct misclassifications caused by rapid changes in posture or lighting.

This classification step supports visual overlays (e.g., highlighting teams in different colors). It feeds into higher-level analysis features, such as team formation recognition, off-the-ball movement, and inter-player distance metrics.

Figure 5: Clustering workflow diagram

⬆ Back

Trajectory smoothing

Raw tracking data -mainly when derived from real-world footage- often exhibits jitter and inconsistencies. This is due to bounding box drift, camera shake, player occlusions, and detection inaccuracies. While sufficient for internal metrics, this raw data lacks the visual smoothness and continuity required for broadcast-quality overlays or professional analysis.

To address this, we implemented a modular trajectory smoothing engine that transforms discrete player positions into continuous, fluid paths. Our system supports four different smoothing strategies:

Linear interpolation
Cubic interpolation
Spline interpolation
Polynomial (fly-over)

Each method can be configured based on user preference or use case. The system allows the analyst to define:

The number of interpolation points
Smoothing tolerance thresholds
Error correction strategies (e.g., skipping outliers or using confidence scores)

Why smoothing matters

Trajectory smoothing is not just aesthetic. It improves downstream analytics by enabling:

Consistent player speed estimation
Accurate heatmaps and influence zones
Frame-perfect overlay rendering for telestration and animation
Reliable input for behavior prediction models

After internal evaluations, the spline interpolation method emerged as the preferred default, balancing visual quality and computational efficiency. It was tested across various frame rates, video qualities, and gameplay styles (e.g., high-pressing vs. possession-heavy teams), consistently producing stable and readable trajectories.

Video editing with animations

To combine AI-powered insights with user creativity, we developed a hardware-accelerated non-linear video editor GStreamer. This editor acts as the final layer in the analytics pipeline, allowing analysts, coaches, and content creators to view, draw, annotate, and export tactical clips with advanced visual overlays.

Key features:

Import video clips
Add tracking overlays and animations
Mix visual elements
Export annotated clips
Create manual annotations

The result is a complete tactical video toolkit that blends automated insight and manual expertise, giving sports professionals a powerful tool to communicate ideas visually—with clarity, control, and cinematic quality.

Conclusions

This project redefines tactical sports analysis by transforming standard video into actionable insights, no sensors or manual tagging, just intelligent AI.

By combining deep learning, synthetic data, and real-time GStreamer pipelines, we’ve built a modular, hardware-accelerated platform capable of:

Automatic detection, tracking, and team identification
Real-time player metrics with smooth and reliable trajectories
Editable video overlays ready for coaching, review, and broadcast

One of the project’s core objectives was to develop independent AI components capable of individually and efficiently executing specific tasks, such as detection, keypoint extraction, or 3D estimation. We successfully achieved this by encapsulating these components as either GStreamer plugins or native C# NuGet packages. These elements were then integrated into a centralized task management infrastructure within the customer’s native environment.

This approach laid the basis for creating Fluendo’s Raven AI Engine, a unified, cross-platform, and multi-device engine designed to handle both AI execution and rendering. Raven supports seamless integration with GStreamer and enables scalable deployment across various use cases and sports domains.

Developed in close collaboration with a key player in the sports industry, such as LongoMatch, the system is already proving its value in production environments. It is designed to be easily adaptable to other team sports with minimal retraining, empowering coaches, analysts, and developers to extract insights faster, more accurately, and at scale.

Would you like to enhance your analysis pipeline? Contact us. We’re happy to help.

Table of contents