Football transfer fees in Europe's 'Big 5' leauges

Football transfers are big business. Since 2000 nearly €45 billion has been spent by teams in the top divisions of England, France, Germany, Italy and Spain, the so-called 'Big 5' leagues - the world's richest playing markets.

The numbers surrounding the buying and selling of players is the focus of this project.

Data

The data we used came from transfermakt.com, a website that is treasure trove for football geeks. It aggregates data about footballers, their clubs and the competitions they play in. It lists things like age, height, goals, assists, contract length and transfer fees - a player's vital statistics.

Football transfers are kind of murky. FIFA, the world governing body, register player transfers and you can access their data, if you pay.

(Data caveat)

Transfermarkt have a good range of data but it is not 'scientific' - there is no guarantee that the transfer fees they list are 100% accurate. But the industry standard seems to agree that they are a fairly reliable source.

The data scrape

Transfermarkt guard their data. The way they structure their site makes it hard, but not impossible, to access their numbers. They seem particularly wary of getting 'scraped.' With the help of @paldhous a work around was achieved using an R script that mimicked human web browsing to fool the transfermarkt site and the data was scraped. Initially we had wanted to scrape all of their pages but settled on just those involving transfers in 'Big 5' leagues.

The scrape results

The scrape provided data on 2,500 transfers and included the player's name, age, playing field position, the team they joined and left, the league they joined, the transfer fee, the player's market value and the season that the transfer took place.

The data was pretty cleaned but still needed a bit of tidying. Player position yielded too many, and too specific results; for the purposes of this project comparing forwards with second strikers was a step too far. I created a column with 4 player positions and another with 9 to simplify analysis.

Hypothesis/es

I approached this project with a theory that young players (under the age of 21) have increased in value over recent seasons.

In 2001 the record transfer fee for a player under 21 was $25 million USD now it stands at more than $145 million USD.

Could we validate this hypothesis visually?

Transfers = big money

Football clubs have spent a lot of money on transfers since 2000... i) Cummulative spend

The English Premier League is King

English clubs have spent more than anyone else (cummulatively) but their dominance has become most pronounced in the last seven years.

i) Area

This was too hard to read, regardless of how it looks.

ii) Line

Easier to read and shows us some nuance and detail, peaks and troughs. Bear in mind that the 2018/2019 season is only halfway through.

Also, look at Italy . . .

Visualizing individual transfer deals

i) Scatterplot across the seasons

This started out messy but with some tweaks came out quite a lot clearer and gives some visual context to the outliers. But as a means of proving our hypothesis it doesn't really work.

The cost of potential

At this point a new variable was required, and we created a new column in the data, Over/Under, to better examine the cost of young players. But the results were quite frustrating.

This graphs showed some interesting things but were they really telling us enough?

Other attempts didn't really provide further clarity.

At this time we were quite a long way from proving our hypothesis; it was time to pivot.

Football is a rich man's game

Started looking at clubs spending habits

This became quite interesting. We can see the dominance of English clubs again.

You can select a specific club and see how they have done.

I think that this graph and the individual player scatter are the most interesting.

But are hypothesis requires further analysis.