Check the output of a single document in a collection of documents

To check the output of a single document in your collection of documents, how I can convert the sparse matrix related only to that document into dense matrix and print it

this is my code

from collections import Counter
from tqdm import tqdm
from scipy.sparse import csr_matrix
import math
import operator
from sklearn.preprocessing import normalize
import numpy
def transform(stringData,vocab):
    rows = []
    columns = []
    values = []
    if isinstance(stringData, (list,)):
        for idx, row in enumerate(tqdm(stringData)): # for each document in the dataset
            # it will return a dict type object where key is the word and values is its frequency, {word:frequency}
            word_freq = dict(Counter(row.split()))
            # for every unique word in the document
            for word, freq in word_freq.items():  # for each unique word in the review.                
                if len(word) < 2:
                    continue
                # we will check if its there in the vocabulary that we build in fit() function
                # dict.get() function will return the values, if the key doesn't exits it will return -1
                col_index = vocab.get(word, -1) # retreving the dimension number of a word
                # if the word exists
                if col_index !=-1:
                    # we are storing the index of the document
                    rows.append(idx)
                    # we are storing the dimensions of the word
                    columns.append(col_index)
                    # we are storing the frequency of the word
                    values.append(freq)
        matrix = csr_matrix((values, (rows,columns)), shape=(len(stringData),len(vocab)))
        
        return matrix
    else:
        print("you need to pass list of strings")```

Well, what’s the definition of a sparse matrix, how is it different from a dense matrix, and what does that mean you need to do in order to go from one to the other?

(And to the smart alec posters coming behind me, yes i’m familiar with the definitions; i’m asking the OP for a reason.)

A sparse matrix is a matrix that is comprised of mostly zero values.
Sparse matrices are distinct from matrices with mostly non-zero values, which are referred to as dense matrices.
Sparse matrices can cause problems with regards to space and time complexity.
The zero values can be ignored and only the data or non-zero values in the sparse matrix need to be stored or acted upon.

So as a programmer, what do you need to do to go from sparse to dense?

There are multiple data structures we can use to construct a sparse matrix. In my code given above, I have constructed the sparse matrix. I have given multiple documents as input. But now I would need to check the output of a single document in my collection of documents.

I tried using todense method on the matrix, but it does not work.

Well that was a lot of words that had nothing to do with how you would go from a sparse matrix to a dense one.

Let’s try that again.

Give me an example of your document collection, and what you would expect the resultant dense matrix to look like.

input dataset

dataset2 = [“the method of lagrange multipliers is the economists workhorse for solving optimization problems”,
“the technique is a centerpiece of economic theory but unfortunately its usually taught poorly”]

Is that the before or the after? What’s the other half of the operation?

The first half of the operation is writing custom fit method.

So… no.

I’ve asked you to give me an example of your document collection, which is supposed to be a sparse matrix, and then take that example and show me what you think the result of making it dense should be.

You’ve given me a list that contains no nonzero elements, which does not constitute a sparse matrix, and when asked for the other half, given me a sentence rather than a matrix.

Either you have the wrong dataset, the wrong understanding of a matrix, or you’ve shown me the output without the input.

input
strings = ["the method of lagrange multipliers is the economists workhorse for solving optimization problems",
           "the technique is a centerpiece of economic theory but unfortunately its usually taught poorly"]
vocab = fit(strings)
print(list(vocab.keys()))
matrix1= transform(strings, vocab).toarray()
print(matrix1)
=============================
output 

[[0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 0 0 2 0 0 0 1] [1 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 1 1 1 0]]

This output I was comparing with the output I got from sklearn countVectorizer.

So there’s your input. Not particularly sparse, but we’ll call it sparse for the moment.

Now what should it look like once it’s dense?

step 1) I have to make sure the output of my implementation is a sparse matrix. Before generating the final output, I need to normalize my sparse matrix using L2 normalization.
step 2) To check the output of a single document in my collection of documents, I need to convert the sparse matrix related only to that document into dense matrix and print it

Now assuming we have the sparse matrix as given above, we would need to do the 2 steps.
I am not given any sample how it should look like.

You have successfully read off your homework paper.

Now answer my question.

What should this example look like when you’re done. Do it manually.

I will do it

Then you haven’t learned the lesson you were taught.

I suggest you go back to your professor and tell them you’re struggling with the concepts, rather than blindly hoping someone will answer your homework for you so that you don’t have to learn something.

[To those that follow, I release this thread to the wilds.]

GOD is there, he will help me.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.