Football, one of the most popular sports on earth, consists of 22 players competing for possession of the ball.
Even though watching football games is a valuable part of the experience, we might understand why it is this way from the data we gather from them.
I describe my experience with interpreting football matches from television-like video feeds in my contribution to the subproblem of football analysis. To learn more, please visit baanstepball.com.
Something in your life bothers you
Moving cameras make it difficult to extract positional and semantic information, even though fixed cameras can be placed throughout a site. Due to budget constraints and permission restrictions, real stadiums could not achieve that. There are a variety of ways to process video data if you don’t want to leave your seat or if you have a tight budget.
In this case, what should you do?
To prevent this task from becoming too overwhelming, we decided to break it up into smaller, more manageable bits (as is the case for a textbook programmer to do).
Due to this, the following divisions were formed:
- Through the use of the camera view (reference estimate and homography estimation), the players’ positions are projected onto a two-dimensional space.
- Players, balls, and officials (e.g., where they come from) must be identified.
- Object tracking (also known as entity tracking) is essential for my project.
- Does it matter which player appears in which frame? Does it matter which player appears in which frame?
- Which team does a player represent (how can I find out).
Following that, we will begin analyzing the specific tasks, such as positioning and semantics.
Field detection (field detection) determines the objects in each frame sequence based on the fields and entities that are detected. Tracking is used when the events detected are nearly consecutive.
In a similar way, we estimate each entity’s position with respect to the camera by projecting its location. We can also track the performance of each player by identifying and placing him within a team.
As soon as the video ends, we repeat it frame by frame until we reach the end. We then smooth the data. By performing ‘backward adjustments’, we smooth the data with the intent of finding similarity in trajectory detection and trajectory paths when we have collected the data frame by frame.
The steps that are performed within the system are visible immediately after a frame is fed into it.
A method for detecting objects
Whenever one works with machine learning, the first thing they notice is that good labeled data is very hard to find. An object detector known as loV3 is one popular option.
With the pre-trained net, the frame can be cropped and results will be disappointing. YOLO was used to transmit the original resolution image because accuracy is more important than speed. By using this method, a player or referee can know when the ball is near them.