How to predict sport results with Big Data

Our Data Science team of 7Puentes developed an internal tournament based on building of what is commonly known as “bot” (computer program that performs task automatically) whom’s  objective is to predict the result of Argentina’s Division A matches. “It all started as a game, because we are curious, lively, and what is worth tom mention, we are football fans,” declared Charly Lizarralde, CEO of 7Puentes.

In the context of this competition, we try to predict the number of goals that will be scored in a certain match of the Argentine football league. We take into account that the quantity of goals in a match usually has an average of 2,5, i.e., half of matches have 3 or more scores and half have 0,1 or 2. Gambling sites denominate “markets” to the different types of bets, since the gambling “books” usually behave in a very similar way as order books in financial markets like the image below.

In this estimator, different features come into play, such as: the history of matches won between teams, positions in competitors’ standings, average of goals each team has as local or visiting, as per the case of that date, among others. Many of these statistics are public and you may read them in pages like promiedos. We’ve built a small scrapy crawler that put’s everything into a simple MySQL database for anyone to explore the data an select their favorite features for the model. Here is the gist code for that spider ( the spider updates every week now …)

“Each one of us developed its own version of this model including the features that it’s own algorithm considers as relevant,” explained Lizarralde and added: “The best of all is that it may be used not only for football, the Argentine sport par excellence, but also for any other sport in which there is a scoreboard.” 

Teams had all summer (until the beginning of next tournament) and the winner “bot” will be in production and will make real bets in the site In order to inspire our Data Scientists,  a basic model (a random forest classifier) is also given to bootstrap the competition. You can do it your self, all the code is in python! 7P’s commitment with participants is that the bot profits will be distributed at the end of the championship.