And we are under way
It's kick-off time and we are playing three at the back at the bridge today
Some housekeeping to take care of at the start. Without the following sources and people this post would not have been possible! Credits to and References from: CHELSEA TV, FBREF, The Athletic, David Sumpter, FourFourTwo, Kevin Minkus, Corey Schafer, John Muller
Being the very first newsletter from this series, I’ve included small intros to each section below and what you can expect from them every week. Have to admit that this week’s edition is particularly long as it includes all the groundwork that nicely sets-up things for the coming weeks.
Match of the Week: Each week I will be analysing a Chelsea match. It will include my notes (taken during the match) followed by some statistical analysis relying on visualization techniques. I hope this will be an iterative process where each week I’ll try something new and get to more advanced techniques relying not only on visualizations and stats but also more sophisticated ML-based approaches as we progress. I expect this section to be our star player up forward and for all the reading and learning efforts feed into it over time. At the end there will be a weekly roundup of links to the interesting media coverage around Chelsea.
Reading Summary: I have started reading aggressively and even that sometimes doesn’t seem enough due to the vastness of the field and the overwhelming rate at which new content is pushed out. This section will have a brief summary of my weekly reading of online blog posts, books and research papers related to soccer tactics and use of data and analytics in the field.
Community Briefing: An attempt to grab some inspirational pieces of work from the exploding social media playground this section will have references to popular and latest content from student enthusiasts, leading industry experts and academic researchers.
Self-learning Progress: I believe in life-long learning and as footballers build up intuition and polish their skills on the training ground, here I will keep adding my personal learning progress from time to time.
1 Match of the Week
EFL Cup: CHE v TOT | Date: 05/01/2022
Formation: Interchangeable 3-4-3 and 4-4-2 and their variations | Result: 2-0
Match Notes dump:
Wingbacks providing all the width with extremely flexible roles for midfielders and attackers
Going 4 at the back (yes ironically enough this is the 1st match analysis as part of this newsletter series and Tuchel decides to play four at the back). Result of this was that Spurs didn't know whether to mark the crowded and confusing midfield transitional play or track the runs-in-behind from deep seemingly triggered at random but with great aplomb
Sarr-Rudiger partnership to counter for Son-Kane pairing hugely successful. Sarr outperformed Rudiger on the night with

Key stats from the MoTM on the night:
Looking at some of the open-play situations from the match one could conclude:
Spurs were so focused to press the players on and around the ball that they left acres of space on the opposite flank and a simple diagonal switch of play across was always on. Ziyech found himself collecting such passes from Havertz on two separate occasions as seen above. On the 1st instance he had a go at the goal and was a decent attempt as the ball curled past the far post; whereas on the other instance it resulted in a dangerous Chelsea break.
On other occasions he found himself running away from the goal thereby dragging Spurs’ defenders out of position and hence allowing Azpilicueta to underlap and cross into the box from the oh-so-crucial half-space on the right inside.
In terms of the key underlying stats, Ziyech had the most shots (3), fouls drawn (3), crosses (6) across the 90 minutes having played only 78 minutes before being subbed out. For historical comparison and considering this as an exercise to get acquainted with the data here’s a quick comparison of how Ziyech has shown signs of improvement this season. As evident from the table below, his non-penalty expected goals and shots per 90 minutes outputs have drastically bettered compared to last season. He’s shooting from closer and more threatening areas as also his passing has gotten tighter compared to more hopeful dinks from last season. Finally, his relatively high creativity, chance-creation and crossing stats have stayed more or less consistent across the two terms.

*Stats’ definitions (as defined on FBREF):
avgShDist: Average distance, in yards, from goal of all shots taken
npxG: Non-Penalty Expected Goals
prgPassDist: Total distance, in yards, that completed passes have traveled towards the opponent's goal.
xA: stands for xG Assisted which is defined as xG which follows a pass that assists a shot
sca: The two offensive actions directly leading to a shot, such as passes, dribbles and drawing fouls.
gca: The two offensive actions directly leading to a goal, such as passes, dribbles and drawing fouls.
Final verdict:
Ziyech finally showing some magic and Chelsea dominant from start to finish with the fluidity in the formation making things awkward for Spurs.
Having said that quality chances were wasted; could have and should have put the game to bed on more than one occasion (Lukaku far post header, Pulisic close-range effort and Werner finesse shot from inside the 18-yard box). This might prove costly at the end of 180 minutes in the tie.
Deep attacking runs from Mount; this sort of influence is tough to track with solely event-data and is something that would be interesting to deep-dive into in the future.
Saul solid in the middle with 3 tackles (2nd highest) and 3 interceptions (highest). Finally he’s starting to show signs of his the player Chelsea hoped they signed from Atlético. Playing in an orthodox midfield 4 might have helped but he’s proved his class in a London derby against an in-form Spurs side.
Chelsea media coverage from the past week:
Lukaku drama in the lead-up to the game
Lukaku’s role at Chelsea is different to how he played at InterReturn of Conte to the bridge
Antonio Conte’s Chelsea sacking and the tribunal that ended in a £26m payoutSaul impresses
Saul has settled – and is starting to show what he can bring to the party
2 Reading Summary
Soccermatics by David Sumpter:
I’ve been reading this book for a while now and during the week I renewed this book’s loan from a nearby library for the 8th consecutive time! I finished off Chapter 11 (Bet Against the Masses) and started with Chapter 12 (Putting my Money where my Mouth is). Both these chapters talk about how fans go about gambling and betting during matches.
Chapter 11 talks about the inherent nature of the betting industry and how companies rely on balancing the odds so as to maximize their profits. The author says “The bookmakers use the collective wisdom of the crowd to set their spreads - and, smart as you may be, you can’t get in quickly enough to turn your knowledge into money.” There are practical examples from the experiments conducted by him and other researchers to conclude that none of these - the crowds (and their questionable ability to get the betting outcome correct when averaged out), the models (predictive models that rank teams and players based on indexes) or experts of the game (pundits, journalists and coaches) - can make long-term predictions that are able to cater for the huge amount of uncertainty and randomness that is typically associated in soccer. Having said that, his ending tone is quite optimistic where he thinks that a mathematically competent individual could have a chance to make a profit on football betting. More on this to follow next week where in Chapter 12 he will take us through some of his hypothetical betting strategies.
Airport shopping:
On my way back from India, I had a lengthy waiting time at London Heathrow before my connection to Dublin. I used this time to pick-up couple of books from my reading list for 2022 at a bargain deal! First one is “The Numbers Game (Chris Anderson and David Sally)” that talks about debunking using data and facts some of the long-held conceptions about the way football has been perceived for the last few decades. The other one is “The Mixer (Michael Cox)” which looks at the historical journey and evolution of football tactics used by Premier League teams and how they have evolved over the past few seasons. Having read the intros to both the books on the flight to Dublin, I look forward to reading them later in the year.
3 Community Briefing
Inspirational work:
This week I’ve picked this innovative piece of work from The Athletic’s John Muller. He’s relied on UMAP (Uniform Manifold Approximation and Projection for dimension reduction) followed by GMM (Gaussian Mixture Models) clustering. Here is the article series based on the very same use of the UMAP and GMM clustering techniques for clustering of some of the top teams in Europe and the Premier League based on their Possession, Passing and Defending stats:

It would be interesting to see how these techniques would visualize individual players or coaches based on underlying stats of their playing and coaching styles respectively. This goes straight into my data-science-notes repository!
People to follow:
Even if you don’t have subscription to The Athletic, you can follow John Muller on Twitter where he regularly posts updates from his personal work. By the looks of it even he has his own newsletter blog if you would want to give it a try.
4 Self-learning Progress
Github updates:
Check out my latest set of updates to my main GitHub repo. Please feel free to give it a follow to keep up-to-date to my weekly updates to get going on the practical side of things that support the content published here.
YouTube:
Couple of nice educational YouTube videos to call out this week:
https://www.youtube.com/playlist?list=PL-osiE80TeTsqhIuOqKhwlXsIBIdSeYtc
I was doing some revision of Python OOP concepts heading into week 1 of my new job and this playlist from Corey Schafer was exactly what I was looking for.An introductory video from FourFourTwo for anyone looking to understand the roots of Soccer Analytics and more broadly the rise of Sports Analytics since the start of the Century.
Podcasts:
Lastly, a quick shout-out to a brilliant podcast series on Bayesian Stats being hosted by Alexandre Andorra. The episode I listened to was #25 where Kevin Minkus was the guest speaker and the guys discussed how Bayesian methods are popular in Football Analytics. They talked about many different techniques ranging from Multi-arm bandits to Hierarchical models.
Quote of the week:
"The Universe is under no obligation to make sense to you" — Neil deGrasse Tyson


