Uncovering Fake Reviews with Azure Cognitive Services and Azure OpenAI: A Case Study
The rise of online reviews has led to a surge in content created by individuals. However, some of these reviews may be considered false or fraudulent, also known as Opinion Spam. This can include anything from promoting an unrelated website or blog to intentionally committing review fraud for financial gain.
Detecting Opinion Spam is crucial because it can mislead potential consumers into purchasing inferior products or avoiding superior ones. Consequently, companies are highly incentivized to automatically identify and remove such reviews. While sentiment analysis and intent recognition have received significant attention in Natural Language Processing (NLP), detecting Opinion Spam using text classification techniques has been relatively underexplored.
Some forms of Opinion Spam are easily recognizable to a human reader, such as advertisements, questions, and other non-opinionated texts. These forms fall under the category of Disruptive Opinion Spam, which are irrelevant statements that are apparent to readers and pose minimal risk since users can choose to ignore them. However, identifying more deceptive forms of Opinion Spam, such as Deceptive Opinion Spam, is a challenging task. These types of spam are intentionally crafted to sound authentic and mislead the reviewer, commonly in the form of fake reviews, either negative or positive, that aim to damage or boost a company’s image. As these reviews are deliberately created to deceive the reader, human reviewers have little success in detecting these fraudulent statements. Therefore, there is an urgent need to address this issue by extracting text patterns with meaningful substructures from these deceptive texts.
The approach to the problem involves text classification, which typically consists of two components: a feature extraction module and a classifier. The feature extraction module generates features from a given text sequence, while the classifier assigns class labels based on the corresponding features. Lexical and syntactic elements are commonly used as features, including total words, characters per word, frequency of large and unique words, function words or phrases, n-grams, bag-of-words (BOW), and Parts-Of-Speech (POS) tagging. Additionally, lexicon containment features can be used to indicate the presence or absence of a term from a lexicon in the text, expressed as a binary value where positive means “occurs” and negative means “doesn’t occur”.
While this methodology has benefits, it also has notable limitations. Managing the quality of the training set can be challenging, and creating a trustworthy classifier necessitates a substantial quantity of accurately labeled texts. Additionally, classifiers that rely on embeddings may be influenced by societal or individual perspectives in the training data, leading to incorrect conclusions. Furthermore, there may be cases where the algorithm’s conclusions are accurate on the training set but not applicable to new situations, posing significant difficulties for detecting Deceptive Opinion Spam.
To comprehend how falsehoods are conveyed in written material, we will explore alternative methodologies. To begin this investigation, we will first utilize the Azure Text Analytics client library for Python, which provides the following capabilities:
- Obtain different facets of sentiment, such as neutrality or negativity.
- Evaluate the overall sentiment, which may not always be consistent. Certain reviews or analyses may contain both positive and negative elements.
- Python 3.5 or later
- Azure subscription and a Cognitive Services or Text Analytics resource. For more information, see the Microsoft official documentation.
- Azure Text Analytics library for Python. Install the library using pip:
- pip install azure-ai-textanalytics
- OpenCV for data visualization. Similarly, install the package via pip:
- pip install opencv-python
- In this study, the Ott Deceptive Opinion spam corpus is being utilized as the dataset. This corpus is a publicly available collection of reviews, which includes 400 truthful and 400 gold-standard deceptive reviews. The reviews were gathered using Amazon Mechanical Turk. Download the CSV file from the Kaggle competition.
Suppose we have two reviews, one truthful and the other deceptive, both expressing the same sentiment (positive or negative). The deceptive review, however, may contain certain patterns that are not easily detectable by humans, but can be recognized by a machine. Such reviews may include exaggerated expressions of emotions, like “I absolutely loved the service, it was simply outstanding”. Wouldn’t it be useful to visually analyze such comments and determine the extent of exaggeration? This can be achieved by following the steps outlined below.
Get Down to Business: Preparation and Setup
First, import all the required packages:
Next, create helpers that allow you to rapidly get a needed subset of Ott Corpus:
How Do You Like It: Create Sentiment Analyzer
Text Analytics is a service that operates on the Cloud, offering advanced NLP capabilities on unprocessed text. Its primary features encompass the following functions:
- Sentiment Analysis
- Named Entity Recognition
- Linked Entity Recognition
- Personally Identifiable Information (PII) Entity Recognition
- Language Detection
- Key Phrase Extraction
- Multiple Analysis
- Healthcare Entities Analysis
The Text Analytics service uses predictive models to analyze documents, which are treated as a single unit. To perform an operation, a list of documents is required, with each document being represented as a string within the list.
The Sentiment Analysis subservice is the only requirement for this experiment. This subservice examines the input text and identifies whether the sentiment is positive, negative, neutral, or a mixture of these. The subservice provides a confidence score and per-sentence sentiment analysis in its response.
Let’s create a helper function, which returns three sentiment aspects of an input string:
To represent each review as a pixel, we require specific values. For instance, the blue color represents neutrality, red denotes negativity, while green represents positivity. The pixels should be merged to form a single image. This way, depending on the sentiment, the corpus can be colorized in a BGR (Blue, Green, Red) format. It is important to note that the acronym used is not RGB but rather BGR since OpenCV is being utilized.
Bring Some Color: Highlight Comments
For this step, we’ll create some helper functions to convert our sentiments into pixel format:
Here, we first convert the sentiment to a BGR format. We then generate an image out of these values.
Spot the Difference: Comparing the Visual Patterns
We are now ready to conduct our first experiment.
To begin with, we will employ the sentiment analyzer on the reviews that have been filtered out. We will specifically focus on reviews that are both deceptive and positive:
To avoid the need to rerun the time-consuming sentiment analysis, we store the dataset on a local device for manipulation. This enables us to easily view the comments:
To compare the results, you should repeat the same procedure for the other subsets – truthful and positive, deceptive and negative, and so on. By doing this, you will have a comprehensive analysis of all the different combinations of truthfulness and sentiment.
As you may notice from the four pictures above, negative deceptive reviews are brighter, with fewer green spots. These characteristics prove there’s some exaggeration in fake comments. The same pattern may be observed in positive reviews. The colors are much more “juicy”, with fewer red spots. We can literally see how certain services are being falsely flattered. On the other hand, truthful reviews tend to be more realistic.
Smoke and Mirrors: Get the Color of Deception
The objective of this stage is to obtain a consistent shade that precisely portrays the deceit. This can be achieved by computing the mean of each pixel’s three channels and merging them together. Once the average of all the pixels has been determined, the channels are combined.
After combining all the functions together, we have the following start script:
To obtain the desired outcome, it is necessary to perform the same process on both truthful and negative reviews. This will lead to the following result:
It appears that negative reviews that are honest do not stand out as much as deceitful ones, and even positive reviews that are fake are more noticeable than genuine comments.
Alternate Option: Utilize Azure’s OpenAI
The Azure OpenAI Service allows users to access OpenAI’s language models, such as GPT-4, GPT-3.5-Turbo, and Embeddings model series, via REST API. These models are highly powerful and customizable for various tasks, including content generation, summarization, semantic search, and natural language to code translation. The latest GPT-4 and GPT-3.5-Turbo model series are now available for general use. Users can interact with the service through REST APIs, Python SDK, or the web-based interface in Azure OpenAI Studio.
The heart of the API service is the completions endpoint. It offers users access to the model’s text-in, text-out interface. By providing an input prompt containing an English text command, users can generate a text completion with ease.
Here’s an example of a simple prompt and completion:
Prompt: """ count to 5 in a for loop """ Completion: for i in range(1, 6): print(i)
Consequently, an alternative approach to detecting the sentiment of a piece of text is by utilizing the Azure OpenAI service. This service can analyze the text and provide an assessment of the overall sentiment expressed within it.
The prompt determines the general sentiment and uses Sentiment Mining service to provide specific aspects. However, you can rephrase the prompt by offering three examples, one each for positive, negative, and neutral sentiment. This way, the prompt can be made more effective.
In this article, we employed sentiment analysis to identify the typical patterns found in fraudulent reviews. To achieve this, we utilized the Text Analytics API’s opinion mining feature, which enabled us to extract various sentiment aspects from the input text, including positive, negative, and neutral polarities. As illustrated in the above example, the contrast in colors between deceptive and honest reviews is noticeable. These differences enable us to conclude that we can assess the degree of hyperbole by visualizing the sentiments.
The code used for this project is available on GitHub, which you can download and test at your convenience.