Check the output of a single document in a collection of documents

d_p_reddy2004 · August 2, 2019, 5:51am

To check the output of a single document in your collection of documents, how I can convert the sparse matrix related only to that document into dense matrix and print it

this is my code

from collections import Counter
from tqdm import tqdm
from scipy.sparse import csr_matrix
import math
import operator
from sklearn.preprocessing import normalize
import numpy
def transform(stringData,vocab):
    rows = []
    columns = []
    values = []
    if isinstance(stringData, (list,)):
        for idx, row in enumerate(tqdm(stringData)): # for each document in the dataset
            # it will return a dict type object where key is the word and values is its frequency, {word:frequency}
            word_freq = dict(Counter(row.split()))
            # for every unique word in the document
            for word, freq in word_freq.items():  # for each unique word in the review.                
                if len(word) < 2:
                    continue
                # we will check if its there in the vocabulary that we build in fit() function
                # dict.get() function will return the values, if the key doesn't exits it will return -1
                col_index = vocab.get(word, -1) # retreving the dimension number of a word
                # if the word exists
                if col_index !=-1:
                    # we are storing the index of the document
                    rows.append(idx)
                    # we are storing the dimensions of the word
                    columns.append(col_index)
                    # we are storing the frequency of the word
                    values.append(freq)
        matrix = csr_matrix((values, (rows,columns)), shape=(len(stringData),len(vocab)))
        
        return matrix
    else:
        print("you need to pass list of strings")```

m_hutley · August 2, 2019, 9:27am

Well, what’s the definition of a sparse matrix, how is it different from a dense matrix, and what does that mean you need to do in order to go from one to the other?

(And to the smart alec posters coming behind me, yes i’m familiar with the definitions; i’m asking the OP for a reason.)

d_p_reddy2004 · August 2, 2019, 9:52am

A sparse matrix is a matrix that is comprised of mostly zero values.
Sparse matrices are distinct from matrices with mostly non-zero values, which are referred to as dense matrices.
Sparse matrices can cause problems with regards to space and time complexity.
The zero values can be ignored and only the data or non-zero values in the sparse matrix need to be stored or acted upon.

m_hutley · August 2, 2019, 9:54am

So as a programmer, what do you need to do to go from sparse to dense?

d_p_reddy2004 · August 2, 2019, 10:03am

There are multiple data structures we can use to construct a sparse matrix. In my code given above, I have constructed the sparse matrix. I have given multiple documents as input. But now I would need to check the output of a single document in my collection of documents.

I tried using todense method on the matrix, but it does not work.

m_hutley · August 2, 2019, 10:13am

Well that was a lot of words that had nothing to do with how you would go from a sparse matrix to a dense one.

Let’s try that again.

Give me an example of your document collection, and what you would expect the resultant dense matrix to look like.

d_p_reddy2004 · August 2, 2019, 10:14am

input dataset

dataset2 = [“the method of lagrange multipliers is the economists workhorse for solving optimization problems”,
“the technique is a centerpiece of economic theory but unfortunately its usually taught poorly”]

m_hutley · August 2, 2019, 10:15am

Is that the before or the after? What’s the other half of the operation?

d_p_reddy2004 · August 2, 2019, 10:17am

The first half of the operation is writing custom fit method.

m_hutley · August 2, 2019, 10:21am

So… no.

I’ve asked you to give me an example of your document collection, which is supposed to be a sparse matrix, and then take that example and show me what you think the result of making it dense should be.

You’ve given me a list that contains no nonzero elements, which does not constitute a sparse matrix, and when asked for the other half, given me a sentence rather than a matrix.

Either you have the wrong dataset, the wrong understanding of a matrix, or you’ve shown me the output without the input.

d_p_reddy2004 · August 2, 2019, 10:26am

input
strings = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems",
           "the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]
vocab = fit(strings)
print(list(vocab.keys()))
matrix1= transform(strings, vocab).toarray()
print(matrix1)
=============================
output 

[[0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 0 0 2 0 0 0 1] [1 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 1 1 1 0]]

This output I was comparing with the output I got from sklearn countVectorizer.

m_hutley · August 2, 2019, 10:35am

So there’s your input. Not particularly sparse, but we’ll call it sparse for the moment.

Now what should it look like once it’s dense?

d_p_reddy2004 · August 2, 2019, 10:39am

step 1) I have to make sure the output of my implementation is a sparse matrix. Before generating the final output, I need to normalize my sparse matrix using L2 normalization.
step 2) To check the output of a single document in my collection of documents, I need to convert the sparse matrix related only to that document into dense matrix and print it

Now assuming we have the sparse matrix as given above, we would need to do the 2 steps.
I am not given any sample how it should look like.

m_hutley · August 2, 2019, 10:40am

You have successfully read off your homework paper.

Now answer my question.

What should this example look like when you’re done. Do it manually.

d_p_reddy2004 · August 2, 2019, 10:43am

I will do it

m_hutley · August 2, 2019, 10:45am

Then you haven’t learned the lesson you were taught.

I suggest you go back to your professor and tell them you’re struggling with the concepts, rather than blindly hoping someone will answer your homework for you so that you don’t have to learn something.

[To those that follow, I release this thread to the wilds.]

d_p_reddy2004 · August 2, 2019, 11:19am

GOD is there, he will help me.

system · November 1, 2019, 6:19pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.