SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Enthusiast
    Join Date
    Mar 2005
    Posts
    47
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    nearest neighbor algorithm in PHP

    Hi guys,

    I am trying to solve the nearest neighbor problem in PHP (see http://en.wikipedia.org/wiki/K-neare...hbor_algorithm))

    I have a list of x,y points representing Cartesian coordinates on a 2d plane. For any given point on the Cartesian plane, I need to find it's nearest neighbor(s).

    This is the first time for me working with the k-nn problem and appreciate any sort of guidance.

    I know I will need to start bisecting the plane, and building a tree. But looking for some more detailed steps on how to approach this. What are the steps? What kind of tree should I use? Any help on how to get started much appreciated. Thank you.

  2. #2
    Use The Cloud
    Join Date
    Jan 2006
    Location
    Boise, ID
    Posts
    556
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here's a quick implementation, I think I got the algorithm right, I just glanced over it.

    PHP Code:
    <?php

    // Random cartesian coordinates (x, y) and labels
    $data = array(
        array(
    12'red'),   // 0 =>
        
    array(53'blue'),  // 1 =>
        
    array(-12'blue'), // 2 =>
        
    array(25'red'),   // 3 =>
        
    array(33'red'),   // 4 =>
        
    array(-45'blue'), // 5 =>
        
    array(22'blue'),  // 6 =>
        
    array(5, -2'red'),  // 7 =>
        
    array(-1, -2'blue'),// 8 =>
    );


    // Build distance matrix
    $distances $data;
    array_walk($distances'euclideanDistance'$data);


    // Example, target = datapoint 5, getting 3 nearest neighbors
    $neighbors getNearestNeighbors($distances53);

    echo 
    getLabel($data$neighbors) . "\n"// red


    /**
     * Calculates eucilean distances for an array dataset
     *
     * @param array $sourceCoords In format array(x, y)
     * @param array $sourceKey Associated array key
     * @param array $data 
     * @return array Of distances to the rest of the data set
     */
    function euclideanDistance(&$sourceCoords$sourceKey$data)
    {   
        
    $distances = array();
        list (
    $x1$y1) = $sourceCoords;
        foreach (
    $data as $destinationKey => $destinationCoords) {
            
    // Same point, ignore
            
    if ($sourceKey == $destinationKey) {
                continue;
            }
            list (
    $x2$y2) = $destinationCoords;
            
    $distances[$destinationKey] = sqrt(pow($x1 $x22) + pow($y1 $y22));
        }
        
    asort($distances);
        
    $sourceCoords $distances;
    }

    /**
     * Returns n-nearest neighbors
     *
     * @param array $distances Distances generated above ^
     * @param mixed $key Array key of source location
     * @param int $num Of neighbors to fetch
     * @return array Of nearest neighbors
     */
    function getNearestNeighbors($distances$key$num)
    {
        return 
    array_slice($distances[$key], 0$numtrue);
    }

    /**
     * Gets result label from associated data
     *
     * @param array $data 
     * @param array $neighbors Result from getNearestNeighbors()
     * @return string label
     */
    function getLabel($data$neighbors)
    {
        
    $results = array();
        
    $neighbors array_keys($neighbors);
        foreach (
    $neighbors as $neighbor) {
            
    $results[] = $data[$neighbor][2];
        }
        
    $values array_count_values($results);
        
    $values array_flip($values);
        
    ksort($values);
        return 
    array_pop($values);
    }
    Value of $distances:

    Code:
    Array
    (
        [0] => Array
            (
                [6] => 1
                [2] => 2
                [4] => 2.2360679775
                [3] => 3.16227766017
                [1] => 4.12310562562
                [8] => 4.472135955
                [7] => 5.65685424949
                [5] => 5.83095189485
            )
    
        [1] => Array
            (
                [4] => 2
                [6] => 3.16227766017
                [3] => 3.60555127546
                [0] => 4.12310562562
                [7] => 5
                [2] => 6.0827625303
                [8] => 7.81024967591
                [5] => 9.21954445729
            )
    
        [2] => Array
            (
                [0] => 2
                [6] => 3
                [8] => 4
                [4] => 4.12310562562
                [5] => 4.24264068712
                [3] => 4.24264068712
                [1] => 6.0827625303
                [7] => 7.21110255093
            )
    
        [3] => Array
            (
                [4] => 2.2360679775
                [6] => 3
                [0] => 3.16227766017
                [1] => 3.60555127546
                [2] => 4.24264068712
                [5] => 6
                [8] => 7.61577310586
                [7] => 7.61577310586
            )
    
        [4] => Array
            (
                [6] => 1.41421356237
                [1] => 2
                [0] => 2.2360679775
                [3] => 2.2360679775
                [2] => 4.12310562562
                [7] => 5.38516480713
                [8] => 6.40312423743
                [5] => 7.28010988928
            )
    
        [5] => Array
            (
                [2] => 4.24264068712
                [0] => 5.83095189485
                [3] => 6
                [6] => 6.7082039325
                [4] => 7.28010988928
                [8] => 7.61577310586
                [1] => 9.21954445729
                [7] => 11.401754251
            )
    
        [6] => Array
            (
                [0] => 1
                [4] => 1.41421356237
                [3] => 3
                [2] => 3
                [1] => 3.16227766017
                [7] => 5
                [8] => 5
                [5] => 6.7082039325
            )
    
        [7] => Array
            (
                [1] => 5
                [6] => 5
                [4] => 5.38516480713
                [0] => 5.65685424949
                [8] => 6
                [2] => 7.21110255093
                [3] => 7.61577310586
                [5] => 11.401754251
            )
    
        [8] => Array
            (
                [2] => 4
                [0] => 4.472135955
                [6] => 5
                [7] => 6
                [4] => 6.40312423743
                [3] => 7.61577310586
                [5] => 7.61577310586
                [1] => 7.81024967591
            )
    )
    Value of $neighbors of point 5 (-4, 5), 3 nearest:

    Code:
    Array
    (
        [2] => 4.24264068712
        [0] => 5.83095189485
        [3] => 6
    )
    2 is blue, 0 is red, and 3 is red, which makes point 5 red.

    In this example we ignore the source point's existing label and reclassify it, but it's just an example.

    HTH

    Edit:

    Oh, and it doesn't do the recursive dive if an equal number of labels exist for a value k, it instead returns one arbitrarily
    Brad Hanson, Web Applications & Scalability Specialist
    ► Is your website outgrowing its current hosting solution?
    ► PM me for a FREE scalability consult!
    ► USA Based: Available by Phone, Skype, AIM, and E-mail.

  3. #3
    SitePoint Enthusiast
    Join Date
    Mar 2005
    Posts
    47
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hey thanks for the reply! I will take a look at your solution.

    I think really the best solution to this is to use a KDTree. But I have no idea how to implement one.

  4. #4
    Use The Cloud
    Join Date
    Jan 2006
    Location
    Boise, ID
    Posts
    556
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    http://en.wikipedia.org/wiki/Kd-tree

    Provides pretty detailed steps on the creation of a Kd-tree.

    Remember to read the references section of the PHP manual as it's slightly different than what you probably learned in your C or Java class.
    Brad Hanson, Web Applications & Scalability Specialist
    ► Is your website outgrowing its current hosting solution?
    ► PM me for a FREE scalability consult!
    ► USA Based: Available by Phone, Skype, AIM, and E-mail.

  5. #5
    SitePoint Enthusiast
    Join Date
    Mar 2005
    Posts
    47
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hey, OK guys, so I followed the algorithm in the wikipedia article and constructed my kdTree with my point set.

    I am now trying to implement the nearest neighbor search. The wikipedia example shows how to find the closest neighbor, but I want the nearest three. Could someone breakdown the algo for me?

    thanks!

  6. #6
    SitePoint Enthusiast
    Join Date
    Mar 2005
    Posts
    47
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    anyone?

  7. #7
    SitePoint Member
    Join Date
    Sep 2010
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The OP has a two-dimensional data set. Why the hell is there a need to model the data in a KDTree before running the nearest neighbor algorithm?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •