Towards the end of the season I decided to try and analyse the Liverpool players based solely on goals for and against. The most straight-forward way to do it (to me) is to break down each player's game to goals for(gf) and goals (ga) against per minute and compare that to the expected gf and ga .
Here's how it's done.
Expected Goals Against and Expected Goals For
First of all, we recognize that there is usually a distinct disparity in GF and GA when a team is at home versus when it travels. So, to figure out what the expected GF and GA should be against a particular opponent, we need to track all of their individual games, and then calculate average GA and GF for both home and away games. When we calculate that average, we have to be careful to not include the results when Liverpool was the opponent.
For example, Manchester City's GF and GA stats (according to statto.com):
Home GA: 0.68
Away GA: 1.26
Home GF: 3.32
Away GF: 2.05
But these include the Liverpool games. For our purposes, we remove the Liverpool results and arrive at the "true" expected GA and GF:
Home GA: 0.66
Away GA: 1.17
Home GF: 3.39
Away GF: 2.06
This gives us the expected gf and ga per game (which I assume is 90 minutes for simplicity), for home and away. It should be noted that these values can and do change over the course of a season, so an individual player's "score" will also change as the expected GF and GA change. Depending on how you look at it, this can be considered good or bad, but for the sake of (at least a little) brevity I won't go into that here. Suffice it to say that in this system, a player's performance score from a game in August can change as the season progresses.
Actual Goals for and Actual Goals Against
For each match, I load the Liverpool players' results into a database. I include the minutes they played as well as the GF and GA while they were on the field.
Let's look at the 4/13 game against Man City (at Anfield) as an example.
The score at the end of the game was 3-2. Here are two players' results:
lastName | pos | gf | ga | minutes | weighted_gf | expected_gf | weighted_ga | expected_ga | date_id |
Suarez | ST | 3 | 2 | 90 | 3 | 1.167 | 2 | 2.056 | 20140413 |
Allen | MID | 1 | 0 | 24 | 3.75 | 1.167 | 0 | 2.056 | 20140413 |
Luis Suarez played 90 minutes. For a 90 minute game, the expected GF is 1.17. The expected GA is 2.06. We scored 3 and Suarez played all 90, so his "actual/weighted gf" is 3. His "weight/actual ga" is, of course 2.
Joe Allen played only 24 minutes, and during that time we scored one while allowing 0.
So, his weighted gf is: 90 / 24 * 1 = 3.75. His weighted ga is: 90/24 * 0 = 0.
To arrive at a "score," you simply subtract the expected values weighted/actual values. E.g.
Suarez:
actualGF - expectedGF = net_GF_per_90 => 3 - 1.17 = 1.83
actualGa - expectedGA = net_GA_per_90 => 2 - 2.06 = -0.06
Luis Suarez got credit for scoring 1.83 goals more than you'd get from the average EPL team playing against Man City at home. He got credit for giving up 0.06 goals less than you'd get from the average EPL team playing Man City at home.
Allen:
net_GF_per_90 => 3.75 - 1.17 = 2.58
net_GA_per_90 => 0 - 2.06 = -2.06
Finally I sum the two together for a total net, that I usually denote as net_p90 in the charts:
Suarez: 1.83 - (-0.06) = 1.89
Allen: 2.58 - (-2.06) = 4.64
As you can see, for the same game two players had very different scores.
The data used to create the charts comes from the database. Most of the fields in the final chart are calculated on the fly from that data.
Here's a sample - if you don't know T-SQL this will look mostly like gibberish. If you know a little T-SQL this will look like almost-not-gibberish. If you know a lot of T-SQL I apologize for this (but in my defense no one is paying me to do it):
select
lastName
,pos
,round(((1.0*sum(gf)/sum(minutes)) - (sum(expected_gf)/count(*)/90.0))*90.0,2) net_gfp90
,round(((1.0*sum(ga)/sum(minutes)) - (sum(expected_ga)/count(*)/90.0))*90.0,2) net_gap90
,round((((1.0*sum(gf)/sum(minutes)) - (sum(expected_gf)/count(*)/90.0))*90.0)
- (((1.0*sum(ga)/sum(minutes)) - (sum(expected_ga)/count(*)/90.0))*90.0),2) net_p90
,sum(minutes) minutes
,sum(appearance) appearances
--,round(avg(points*1.0),2) avg_points
-- ,(sum(minutes)/sum(appearance)*1.0)/90.0*round(avg(talis_points*1.0),2) talisman_coeff
from (
select
player.lastName
,player.primaryPosition as pos
,A.gf
,A.ga
,A.minutes
,A.appearance
,case
when A.points = 0 then -2.625
else A.points
end as talis_points
,A.points as points
,(A.gf*1.0)/minutes actual_gf_per_minute
,case
when match.home_away='h' then team.adj_away_ga
when match.home_away='a' then team.adj_home_ga
end as expected_gf
,(A.ga*1.0)/minutes actual_ga_per_minute
,case
when match.home_away='h' then team.adj_away_gf
when match.home_away='a' then team.adj_home_gf
end as expected_ga
from factFixtures as A
inner join dimTeam as team on team.id=A.opponent_id
inner join dimMatch as match on match.id=A.match_id
inner join dimPlayer as player on A.player_id=player.id
where match.date_id >= 20131207--between 20131215 and 20131229
) tblA
--where /*pos in ('M') or */lastName='Mignolet'
group by lastName,pos having sum(appearance) > 9--having sum(minutes) > 605
order by 5 desc;
That produces this table:
lastName | pos | net_gfp90 | net_gap90 | net_p90 | minutes | appearances |
Henderson | MID | 1.8 | -0.22 | 2.02 | 1890 | 21 |
Flanagan | FB | 1.82 | -0.04 | 1.85 | 1554 | 19 |
Toure | CB | 1.81 | 0 | 1.81 | 611 | 10 |
Cissokho | FB | 1.76 | -0.03 | 1.78 | 783 | 10 |
Sturridge | ST | 2.03 | 0.28 | 1.74 | 1262 | 16 |
Suarez | ST | 1.71 | 0.02 | 1.7 | 2155 | 24 |
Skrtel | CB | 1.71 | 0.01 | 1.69 | 2160 | 24 |
Mignolet | GK | 1.71 | 0.01 | 1.69 | 2160 | 24 |
Johnson | FB | 1.75 | 0.13 | 1.63 | 1756 | 20 |
Sterling | MID | 1.59 | -0.01 | 1.6 | 1805 | 24 |
Gerrard | MID | 1.74 | 0.23 | 1.51 | 1664 | 20 |
Allen | MID | 1.42 | -0.02 | 1.45 | 1249 | 19 |
Coutinho | MID | 1.67 | 0.28 | 1.39 | 1587 | 23 |
Agger | CB | 1.33 | -0.03 | 1.36 | 889 | 12 |
Sakho | CB | 1.43 | 0.08 | 1.35 | 898 | 10 |
Leiva | MID | 1.46 | 0.11 | 1.34 | 894 | 16 |
Moses | MID | 0.2 | 0.3 | -0.1 | 209 | 11 |
So there you have it.
Each week, I take all of the league results and load them into my database. I've automated most of that but there's still a little manual work (like updating the league positions of the teams so I can do splits like "Against the Top 5 Teams blah blah").
Then I load in the Liverpool result with each player's data and run some scripts to calculate a few basic fields.
The rest is writing SQL queries to pull whatever I'm interested in.
Hopefully that makes sense - if not, feel free to leave a question!