FanPost

Player Analysis - Methodology (for AJ)


Towards the end of the season I decided to try and analyse the Liverpool players based solely on goals for and against. The most straight-forward way to do it (to me) is to break down each player's game to goals for(gf) and goals (ga) against per minute and compare that to the expected gf and ga .

Here's how it's done.

Expected Goals Against and Expected Goals For

First of all, we recognize that there is usually a distinct disparity in GF and GA when a team is at home versus when it travels. So, to figure out what the expected GF and GA should be against a particular opponent, we need to track all of their individual games, and then calculate average GA and GF for both home and away games. When we calculate that average, we have to be careful to not include the results when Liverpool was the opponent.

For example, Manchester City's GF and GA stats (according to statto.com):

Home GA: 0.68

Away GA: 1.26

Home GF: 3.32

Away GF: 2.05

But these include the Liverpool games. For our purposes, we remove the Liverpool results and arrive at the "true" expected GA and GF:

Home GA: 0.66

Away GA: 1.17

Home GF: 3.39

Away GF: 2.06

This gives us the expected gf and ga per game (which I assume is 90 minutes for simplicity), for home and away. It should be noted that these values can and do change over the course of a season, so an individual player's "score" will also change as the expected GF and GA change. Depending on how you look at it, this can be considered good or bad, but for the sake of (at least a little) brevity I won't go into that here. Suffice it to say that in this system, a player's performance score from a game in August can change as the season progresses.

Actual Goals for and Actual Goals Against

For each match, I load the Liverpool players' results into a database. I include the minutes they played as well as the GF and GA while they were on the field.

Let's look at the 4/13 game against Man City (at Anfield) as an example.

The score at the end of the game was 3-2. Here are two players' results:

lastName pos gf ga minutes weighted_gf expected_gf weighted_ga expected_ga date_id
Suarez ST 3 2 90 3 1.167 2 2.056 20140413
Allen MID 1 0 24 3.75 1.167 0 2.056 20140413

Luis Suarez played 90 minutes. For a 90 minute game, the expected GF is 1.17. The expected GA is 2.06. We scored 3 and Suarez played all 90, so his "actual/weighted gf" is 3. His "weight/actual ga" is, of course 2.

Joe Allen played only 24 minutes, and during that time we scored one while allowing 0.

So, his weighted gf is: 90 / 24 * 1 = 3.75. His weighted ga is: 90/24 * 0 = 0.

To arrive at a "score," you simply subtract the expected values weighted/actual values. E.g.

Suarez:

actualGF - expectedGF = net_GF_per_90 => 3 - 1.17 = 1.83

actualGa - expectedGA = net_GA_per_90 => 2 - 2.06 = -0.06

Luis Suarez got credit for scoring 1.83 goals more than you'd get from the average EPL team playing against Man City at home. He got credit for giving up 0.06 goals less than you'd get from the average EPL team playing Man City at home.

Allen:

net_GF_per_90 => 3.75 - 1.17 = 2.58

net_GA_per_90 => 0 - 2.06 = -2.06

Finally I sum the two together for a total net, that I usually denote as net_p90 in the charts:

Suarez: 1.83 - (-0.06) = 1.89

Allen: 2.58 - (-2.06) = 4.64

As you can see, for the same game two players had very different scores.

The data used to create the charts comes from the database. Most of the fields in the final chart are calculated on the fly from that data.

Here's a sample - if you don't know T-SQL this will look mostly like gibberish. If you know a little T-SQL this will look like almost-not-gibberish. If you know a lot of T-SQL I apologize for this (but in my defense no one is paying me to do it):

select

lastName

,pos

,round(((1.0*sum(gf)/sum(minutes)) - (sum(expected_gf)/count(*)/90.0))*90.0,2) net_gfp90

,round(((1.0*sum(ga)/sum(minutes)) - (sum(expected_ga)/count(*)/90.0))*90.0,2) net_gap90

,round((((1.0*sum(gf)/sum(minutes)) - (sum(expected_gf)/count(*)/90.0))*90.0)

- (((1.0*sum(ga)/sum(minutes)) - (sum(expected_ga)/count(*)/90.0))*90.0),2) net_p90

,sum(minutes) minutes

,sum(appearance) appearances

--,round(avg(points*1.0),2) avg_points

-- ,(sum(minutes)/sum(appearance)*1.0)/90.0*round(avg(talis_points*1.0),2) talisman_coeff

from (

select

player.lastName

,player.primaryPosition as pos

,A.gf

,A.ga

,A.minutes

,A.appearance

,case

when A.points = 0 then -2.625

else A.points

end as talis_points

,A.points as points

,(A.gf*1.0)/minutes actual_gf_per_minute

,case

when match.home_away='h' then team.adj_away_ga

when match.home_away='a' then team.adj_home_ga

end as expected_gf

,(A.ga*1.0)/minutes actual_ga_per_minute

,case

when match.home_away='h' then team.adj_away_gf

when match.home_away='a' then team.adj_home_gf

end as expected_ga

from factFixtures as A

inner join dimTeam as team on team.id=A.opponent_id

inner join dimMatch as match on match.id=A.match_id

inner join dimPlayer as player on A.player_id=player.id

where match.date_id >= 20131207--between 20131215 and 20131229

) tblA

--where /*pos in ('M') or */lastName='Mignolet'

group by lastName,pos having sum(appearance) > 9--having sum(minutes) > 605

order by 5 desc;

That produces this table:

lastName pos net_gfp90 net_gap90 net_p90 minutes appearances
Henderson MID 1.8 -0.22 2.02 1890 21
Flanagan FB 1.82 -0.04 1.85 1554 19
Toure CB 1.81 0 1.81 611 10
Cissokho FB 1.76 -0.03 1.78 783 10
Sturridge ST 2.03 0.28 1.74 1262 16
Suarez ST 1.71 0.02 1.7 2155 24
Skrtel CB 1.71 0.01 1.69 2160 24
Mignolet GK 1.71 0.01 1.69 2160 24
Johnson FB 1.75 0.13 1.63 1756 20
Sterling MID 1.59 -0.01 1.6 1805 24
Gerrard MID 1.74 0.23 1.51 1664 20
Allen MID 1.42 -0.02 1.45 1249 19
Coutinho MID 1.67 0.28 1.39 1587 23
Agger CB 1.33 -0.03 1.36 889 12
Sakho CB 1.43 0.08 1.35 898 10
Leiva MID 1.46 0.11 1.34 894 16
Moses MID 0.2 0.3 -0.1 209 11

So there you have it.

Each week, I take all of the league results and load them into my database. I've automated most of that but there's still a little manual work (like updating the league positions of the teams so I can do splits like "Against the Top 5 Teams blah blah").

Then I load in the Liverpool result with each player's data and run some scripts to calculate a few basic fields.

The rest is writing SQL queries to pull whatever I'm interested in.

Hopefully that makes sense - if not, feel free to leave a question!

Please make any posts here geared towards creating discussion. Any works that are copied and pasted from other sites will be deleted. This isn't a place to advertise services or your blog, it's an extension of our site. Please treat it as such.

In This FanPost

Teams
Players