Posts by Collection

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

SoccerNet 2022 challenges results

Published in ACM Workshop on Multimedia Content Analysis in Sports, 2022

Abstract: The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on detecting line and goal part elements, (4) camera calibration, dedicated to retrieving the intrinsic and extrinsic camera parameters, (5) player re-identification, focusing on retrieving the same players across multiple views, and (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams. Compared to last year’s challenges, tasks (1-2) had their evaluation metrics redefined to consider tighter temporal accuracies, and tasks (3-6) were novel, including their underlying data and annotations.

Recommended citation: Giancola, S., Cioppa, A., Deliège, A., Magera, F., Somers, V., Kang, L., ... & Li, Z. (2022, October). SoccerNet 2022 challenges results. In Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports (pp. 75-86).
arXiv

SoccerNet 2023 challenges results

Published in Sports Engineering, 2023

Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches.

Recommended citation: Cioppa, A., Giancola, S., Somers, V., Magera, F., Zhou, X., Mkhallati, H., ... & Meng, Z. (2023). SoccerNet 2023 challenges results. arXiv preprint arXiv:2309.06006.
arXiv

Video-based Skill Assessment for Golf: Estimating Golf Handicap

Published in ACM Workshop on Multimedia Content Analysis in Sports, 2023

Abstract: Automated skill assessment in sports using video-based analysis holds great potential for revolutionizing coaching methodologies. This paper focuses on the problem of skill determination in golfers by leveraging deep learning models applied to a large database of video recordings of golf swings. We investigate different regression, ranking and classification based methods and compare to a simple baseline approach. The performance is evaluated using mean squared error (MSE) as well as computing the percentages of correctly ranked pairs based on the Kendall correlation. Our results demonstrate an improvement over the baseline, with a 35% lower mean squared error and 68% correctly ranked pairs. However, achieving fine-grained skill assessment remains challenging. This work contributes to the development of AI-driven coaching systems and advances the understanding of video-based skill determination in the context of golf.

Recommended citation: Ingwersen, C. K., Xarles, A., Clapés, A., Madadi, M., Jensen, J. N., Hannemose, M. R., ... & Escalera, S. (2023, October). Video-based Skill Assessment for Golf: Estimating Golf Handicap. In Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports (pp. 31-39).
arXiv

ASTRA: An Action Spotting TRAnsformer for Soccer Videos

Published in ACM Workshop on Multimedia Content Analysis in Sports, 2023

Abstract: In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transformer encoder-decoder architecture to achieve the desired output temporal resolution and to produce precise predictions, (b) a balanced mixup strategy to handle the long-tail distribution of the data, (c) an uncertainty-aware displacement head to capture the label variability, and (d) input audio signal to enhance detection of non-visible actions. Results demonstrate the effectiveness of ASTRA, achieving a tight Average-mAP of 66.82 on the test set. Moreover, in the SoccerNet 2023 Action Spotting challenge, we secure the 3rd position with an Average-mAP of 70.21 on the challenge set.

Recommended citation: Xarles, A. Escalera, S., Moeslund, T. B., & Clapés, A. (2023, October). ASTRA: An Action Spotting TRAnsformer for Soccer Videos. In Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports (pp. 93-102).
arXiv | Code | Page

T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos

Published in CVPRW - CVsports, 2024

Abstract: In this paper, we introduce T-DEED, a Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in sports videos. T-DEED addresses multiple challenges in the task, including the need for discriminability among frame representations, high output temporal resolution to maintain prediction precision, and the necessity to capture information at different temporal scales to handle events with varying dynamics. It tackles these challenges through its specifically designed architecture, featuring an encoder-decoder for leveraging multiple temporal scales and achieving high output temporal resolution, along with temporal modules designed to increase token discriminability. Leveraging these characteristics, T-DEED achieves SOTA performance on the FigureSkating and FineDiving datasets. An extended version of the paper can be found here: ExtendedPaper.

Recommended citation: Xarles, A., Escalera, S., Moeslund, T. B., & Clapés, A. (2024). T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
arXiv | Code | Page

Action Anticipation from SoccerNet Football Video Broadcasts

Published in CVPRW - CVSports, 2025

Abstract: Artificial intelligence has revolutionized the way we analyze sports videos, whether to understand the actions of games in long untrimmed videos or to anticipate the player’s motion in future frames. Despite these efforts, little attention has been given to anticipating game actions before they occur. In this work, we introduce the task of action anticipation for football broadcast videos, which consists in predicting future actions in unobserved future frames, within a five- or ten-second anticipation window. To benchmark this task, we release a new dataset, namely the SoccerNet Ball Action Anticipation (SN-BAA) dataset, based on SoccerNet Ball Action Spotting. Additionally, we propose a Football Action ANticipation TRAnsformer (FAANTRA), a baseline method that adapts FUTR, a state-of-the-art action anticipation model, to predict ball-related actions. To evaluate action anticipation, we introduce new metrics, including mAP@delta, which evaluates the temporal precision of predicted future actions, as well as mAP@infty, which evaluates their occurrence within the anticipation window. We also conduct extensive ablation studies to examine the impact of various task settings, input configurations, and model architectures. Experimental results highlight both the feasibility and challenges of action anticipation in football videos, providing valuable insights into the design of predictive models for sports analytics. By forecasting actions before they unfold, our work will enable applications in automated broadcasting, tactical analysis, and player decision-making. We will release our dataset, baseline, and benchmark publicly, to promote reproducibility and encourage further research.

Recommended citation: Dalal, M., Xarles, A., Cioppa, A., Giancola, S., Van Droogenbroeck, M., Ghanem, B., ... & Moeslund, T. B. (2025). Action Anticipation from SoccerNet Football Video Broadcasts. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
arXiv | Code

Action Valuation in Sports: A Survey

Published in CVPRW - CVSports, 2025

Abstract: Action Valuation (AV) has emerged as a key topic in Sports Analytics, offering valuable insights by assigning scores to individual actions based on their contribution to desired outcomes. Despite a few surveys addressing related concepts such as Player Valuation, there is no comprehensive review dedicated to an in-depth analysis of AV across different sports. In this survey, we introduce a taxonomy with nine dimensions related to the AV task, encompassing data, methodological approaches, evaluation techniques, and practical applications. Through this analysis, we aim to identify the essential characteristics of effective AV methods, highlight existing gaps in research, and propose future directions for advancing the field.

Recommended citation: Xarles, A., Escalera, S., Moeslund, T. B., & Clapés, A. (2025). Action Valuation in Sports: A Survey. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
arXiv | Page

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015