# How to implement k nearest neighbor algorithm for football prediction system in php?

I am trying to write a k nearest neighbor algorithm for a football prediction system which consist of criteria such as player’s rating, player’s form, team ranking, and venue. These are the criteria that are needed for the prediction. But i don’t know how to implement the AI in php for the prediction. Can any1 guide me on how to implement an AI algorithm for this prediction?

It’s not really AI but rather a simple algorithm which can be implemented however you like. It’s looking for something, like the player’s average score per game, and uses something similar to linear interpolation by comparison to determine it. It usually has an error margin of upto 30% - because its generally impossible, for a situation like this, to predict anything significant with significant accuracy.

The whole algorithm revolves around basic Euclidean geometry. What you want to do is find the k ‘nearest neighbours’ on a 4-dimensional theoretical graph: one dimension for rating, one for form, one for ranking and one for venue. The venue will need to be converted to a number based on field quality.

These numbers should also be weighted correctly. You don’t want venue to affect the outcome more than player rating, for example. Team ranking is an alkward one - team rating would be better. The top 5 teams are generally pretty darned close in terms of skill, teamwork and player ability - as are the bottom 5. So you could either apply a function on the team ranking to reduce the impact of individual ranking but increase the impact of the overall rating.

To find the nearest k neighbours, simply go through all the other players, subtract each of their attributes with the player in question’s attributes. Raise these to the power of how many attributes you have (e.g. in your example, 4). If this number is odd, you want to take the abs() of this number to make sure it’s positive. Sum all these numbers up to get a value N. The closest k neighbours have the k smallest values of N.

These values of N may have a similar value of what you’re trying to guess based on results. The value of k you use will have to be monitored and modified often until you find the right kind of numbers, as will the weightings on each value.

Technically speaking, most of this can be done via MySQL. You’d need to work out the specific query using PHP but MySQL can do the legwork by working through these results and returning the k closest values using an ORDER and LIMIT statement.