Sentiment Analysis with Python
This will print the top five results, as pictured below.

Data Cleaning and Tokenization
We’ll be building on previously gained knowledge on data cleaning and tokenization found in “ Getting Started with Natural Language Processing in Python ”. Data cleaning is the process of removing noise from our dataset, to help increase the accuracy of our prediction model. In the tokenization step, we convert each review into a list of words (tokens).
To help with data cleaning and tokenization, we need to download some NLTK resources:
Code snippet
nltk.download('punkt')nltk.download('wordnet')nltk.download('averaged_perceptron_tagger')nltk.download('stopwords')We use these resources for the following:
- Punkt: a pre-trained tokenizer for the English Language.
- Wordnet: a large lexical database of English. It’s needed by the
WordNetLemmatizerclass to lemmatize sentences. averaged_perceptron_tagger: a resource used for tagging words with their parts of speech.stopwords: a corpus containing 2,400 stopwords for 11 languages.
Let’s create a function to help clean and tokenize each movie review: