Find the closest m points from a given point p

Consider you are given n data points in the form of list of tuples like S=[(x1,y1),(x2,y2),(x3,y3),(x4,y4),(x5,y5),…,(xn,yn)] and a point P=(p,q)
your task is to find 5 closest points(based on cosine distance) in S from P

Cosine distance between two points (x,y) and (p,q) is defined as
co
s
−1
(
(x⋅p+y⋅q)
(


x
2
+
y
2
)⋅
(


p
2
+
q
2
)

)
cos−1((x⋅p+y⋅q)(x2+y2)⋅(p2+q2))
Ex:

S= [(1,2),(3,4),(-1,1),(6,-7),(0, 6),(-5,-8),(-1,-1)(6,0),(1,-1)]
P= (3,-4)

Output:
(6,-7)
(1,-1)
(6,0)
(-5,-8)
(-1,-1)

Welcome to the forums, @d_p_reddy2004.

It is unclear why you are posting. Do you need help with this task? If so, please post your code so far and explain where you have a problem.

I am newbie to python. I am not able to find where to start.

This looks like a homework assignment, so you’re unlikely to find anybody here who will do it for you.

People are very willing to help you learn, but you need to demonstrate that you’re making an effort yourself.

def pClosest(points, K): 
  
    points.sort(key = lambda K: K[0]**2 + K[1]**2) 
  
    return points[:K] 

I am doing this for closest points to origin. But I need to do this for a given point. The formula cosine distance. How this can be done.

First by learning what cosine distance is and why it’s not relevant to finding the closest points to another point, unless you’re actually trying to talk about the similarity. of three points and their respective vectors.

Here’s what I want you to do, since you’re new to python.

Explain, in english, step by step what you need to do to find the solution. Worry about code once you’ve got your procedure down.

1 Like

EDIT: Okay, reread the OP. My brain will wake up eventually.

What you’re being tasked to find is not the CLOSEST points. It’s the most SIMILAR points.

How do you define cosine distance?

I have drawn the points on paper.
I understand that the cosine distance measures the angular distance between vectors a and b.
While cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction.

Since in our case we want to find the closest distance between two points, we must go with cosine distance. Hope I understanding correctly.

Cosine Distance = 1-Cosine Similarity. Angular distance is a different measure (though it is related, and is probably the metric you are ACTUALLY trying to use. It may be worth questioning your professor if they actually meant Angular Distance as the metric here)
Closest = smallest distance, so if you want the smallest distance, you want to make the similarity approach 1. (because when similarity = 1, Distance = 0.)

The ‘closest’ elements by cosine distance are the closest elements by similarity.
This is different from Euclidean Distance.
Lets see if you do understand: What’s the closest point to the point at (1,-1) on your piece of paper? What’s the farthest?

If we are asked to find the points that are closest geometrically, then you have to go with finding the euclidean distances.
If you are asked to find out the closest points by similarity(not geometrically) then go with finding cosine distances.

def cosine_similarity(v1,v2):
‘’'compute cosine similarity of v1 to v2: (v1 dot v2)/{||v1||||v2||)’’’
sumxx, sumxy, sumyy = 0, 0, 0
for i in range(len(v1)):
x = v1[i]; y = v2[i]
sumxx += x
x
sumyy += yy
sumxy += x
y
return sumxy/math.sqrt(sumxx*sumyy)

V1= [(1,2),(3,4),(-1,1),(6,-7),(0, 6),(-5,-8),(-1,-1)(6,0),(1,-1)]
V2= (3,-4)
cosine_similarity(V1,V2)

TypeError: ‘tuple’ object is not callable
is there a way we can fix it

Okay… i spent some more time staring at it and i understand what you were trying to do. Confused me.

You meant to pass only a single tuple to each of these variables. I understand now.
So lets assume you’ve passed a single tuple to each.
x = v1[i]; y = v2[i]
This is incorrect, technically.
v1[i] and v2[i] are not x and y, but ax and bx for a given index. (0 = x, 1 = y, 2 = z, etc), so its possible you’re messing yourself up with your labelling.
sumxx, then is you doing the A magnitude,
sumyy is you doing the B magnitude,
and sumxy is you doing the dot product.

I got you now.

So your primary problem is that you need to loop OUTSIDE the function so that you only send 1 tuple into it each time.

def cosine(v1, v2):
v1 = np.array(v1)
v2 = np.array(v2)
return np.dot(v1, v2) / (np.sqrt(np.sum(v12)) * np.sqrt(np.sum(v22)))

I want to achieve this using plain python and without any other libraries.

Yup, and you’ve pretty much done it.

V1 at this point is a list of tuples. Your function doesnt take a list of tuples, it takes a single tuple. So send it a single tuple instead.

# List of tuple initialization
lt = [('Geeks', 2), ('For', 4), ('geek', '6')] 
# using list comprehension 
out = [item for t in lt for item in t] 
# printing output 
print(out) 

The above code works fine. But when I use with my input as below it is not working. What could be the fix.

S= [(1,2),(3,4),(-1,1),(6,-7),(0, 6),(-5,-8),(-1,-1)(6,0),(1,-1)] 
# List of tuple initialization
# using list comprehension 
out = [item for t in S for item in t] 
# printing output 
print(out)

Here I get error :
TypeError: ‘tuple’ object is not callable

Look very closely… one of these pairs is missing something…

2 Likes

Yes, got it, it is the coma missing
S= [(1,2),(3,4),(-1,1),(6,-7),(0, 6),(-5,-8),(-1,-1),(6,0),(1,-1)]
Thank you

1 Like

I could get the cosine distance for all the points. Now I would need to print first five points in ascending order of their distance. How will this be achieved.

I got the answer. I have written the code as below:
sorted_d = sorted(dict1.items(), key=lambda x: x[1])

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.