To check the output of a single document in your collection of documents, how I can convert the sparse matrix related only to that document into dense matrix and print it
this is my code
from collections import Counter
from tqdm import tqdm
from scipy.sparse import csr_matrix
import math
import operator
from sklearn.preprocessing import normalize
import numpy
def transform(stringData,vocab):
rows = []
columns = []
values = []
if isinstance(stringData, (list,)):
for idx, row in enumerate(tqdm(stringData)): # for each document in the dataset
# it will return a dict type object where key is the word and values is its frequency, {word:frequency}
word_freq = dict(Counter(row.split()))
# for every unique word in the document
for word, freq in word_freq.items(): # for each unique word in the review.
if len(word) < 2:
continue
# we will check if its there in the vocabulary that we build in fit() function
# dict.get() function will return the values, if the key doesn't exits it will return -1
col_index = vocab.get(word, -1) # retreving the dimension number of a word
# if the word exists
if col_index !=-1:
# we are storing the index of the document
rows.append(idx)
# we are storing the dimensions of the word
columns.append(col_index)
# we are storing the frequency of the word
values.append(freq)
matrix = csr_matrix((values, (rows,columns)), shape=(len(stringData),len(vocab)))
return matrix
else:
print("you need to pass list of strings")```
Well, what’s the definition of a sparse matrix, how is it different from a dense matrix, and what does that mean you need to do in order to go from one to the other?
(And to the smart alec posters coming behind me, yes i’m familiar with the definitions; i’m asking the OP for a reason.)
A sparse matrix is a matrix that is comprised of mostly zero values.
Sparse matrices are distinct from matrices with mostly non-zero values, which are referred to as dense matrices.
Sparse matrices can cause problems with regards to space and time complexity.
The zero values can be ignored and only the data or non-zero values in the sparse matrix need to be stored or acted upon.
There are multiple data structures we can use to construct a sparse matrix. In my code given above, I have constructed the sparse matrix. I have given multiple documents as input. But now I would need to check the output of a single document in my collection of documents.
I tried using todense method on the matrix, but it does not work.
dataset2 = [“the method of lagrange multipliers is the economists workhorse for solving optimization problems”,
“the technique is a centerpiece of economic theory but unfortunately its usually taught poorly”]
I’ve asked you to give me an example of your document collection, which is supposed to be a sparse matrix, and then take that example and show me what you think the result of making it dense should be.
You’ve given me a list that contains no nonzero elements, which does not constitute a sparse matrix, and when asked for the other half, given me a sentence rather than a matrix.
Either you have the wrong dataset, the wrong understanding of a matrix, or you’ve shown me the output without the input.
step 1) I have to make sure the output of my implementation is a sparse matrix. Before generating the final output, I need to normalize my sparse matrix using L2 normalization.
step 2) To check the output of a single document in my collection of documents, I need to convert the sparse matrix related only to that document into dense matrix and print it
Now assuming we have the sparse matrix as given above, we would need to do the 2 steps.
I am not given any sample how it should look like.
Then you haven’t learned the lesson you were taught.
I suggest you go back to your professor and tell them you’re struggling with the concepts, rather than blindly hoping someone will answer your homework for you so that you don’t have to learn something.
[To those that follow, I release this thread to the wilds.]