In the digital age, customer interactions are increasingly mediated through audio channels, with customer service centers recording thousands of conversations daily. These audio files represent a rich, yet often untapped, reservoir of insights into customer satisfaction, recurring issues, and evolving emotional states during service interactions. Manually sifting through this vast amount of data is a Sisyphean task, but the advent of sophisticated Artificial Intelligence (AI) offers a powerful solution. This article details a comprehensive project leveraging open-source AI tools to automatically transcribe calls, analyze customer sentiment, detect emotions, and extract recurring topics, all processed locally on a user’s machine, thereby safeguarding sensitive customer data.
The motivation behind developing such a local AI solution stems from the inherent limitations and concerns associated with cloud-based AI services. While platforms like OpenAI’s API offer robust capabilities, their reliance on external servers raises significant privacy concerns, especially when dealing with personal customer information. Furthermore, the per-API-call pricing model can quickly escalate costs for high-volume operations, and users are often subject to internet rate limits. By processing data locally, this project addresses these challenges, ensuring compliance with data residency requirements and providing a cost-effective, privacy-centric approach to customer interaction analysis. The system’s architecture, as depicted in Figure 2, emphasizes a modular design where each component is optimized for a specific task, making it easy to understand, test, and extend.
Project Setup and Local Deployment
Initiating the project involves cloning the provided GitHub repository and setting up a dedicated virtual environment to manage dependencies. This ensures a clean and reproducible development environment. The core dependencies, listed in requirements.txt, include essential libraries for audio processing, natural language processing, and dashboard development.
The initial setup requires cloning the repository:

git clone https://github.com/zenUnicorn/Customer-Sentiment-analyzer.git
Following this, users are instructed to create and activate a Python virtual environment. For Windows users, this would typically involve:
python -m venv venv
venvScriptsactivate
On macOS and Linux systems, the commands are:
python3 -m venv venv
source venv/bin/activate
Once the environment is active, all necessary libraries can be installed:
pip install -r requirements.txt
A crucial aspect of the initial setup is the one-time download of AI models, totaling approximately 1.5GB. Once downloaded, these models operate entirely offline, enabling continuous analysis without an internet connection. The successful installation is typically indicated by output similar to that shown in Figure 3, confirming that all dependencies have been met and the system is ready for operation.
Whisper: The Engine of Audio Transcription
The foundational step in analyzing customer conversations is converting spoken words into accurate text. For this purpose, the project employs Whisper, an advanced automatic speech recognition (ASR) system developed by OpenAI. Whisper’s effectiveness lies in its Transformer-based encoder-decoder architecture, trained on an extensive dataset of 680,000 hours of multilingual audio. This extensive training allows it to achieve remarkable accuracy, even in the presence of background noise or diverse accents.

Whisper processes audio by first converting it into a mel spectrogram. This representation visualizes sound as a 2D image, where the x-axis denotes time, the y-axis represents frequency, and color intensity signifies volume. This "visual" representation of sound allows the Transformer model to process audio much like it processes image data, enabling a nuanced understanding of vocal patterns. The system then decodes this spectrogram into a highly accurate text transcript, segmenting the audio into manageable chunks with associated timestamps, as illustrated in Figure 4.
The project’s implementation of Whisper is encapsulated within the AudioTranscriber class. Users can select different model sizes, each offering a trade-off between speed and accuracy:
| Model | Parameters | Speed | Best For |
|---|---|---|---|
| tiny | 39M | Fastest | Quick testing |
| base | 74M | Fast | Development |
| small | 244M | Medium | Production |
| large | 1550M | Slow | Maximum accuracy |
For most practical applications, the base or small models provide an optimal balance between performance and computational resources. The core transcription logic is straightforward:
import whisper
class AudioTranscriber:
def __init__(self, model_size="base"):
self.model = whisper.load_model(model_size)
def transcribe_audio(self, audio_path):
result = self.model.transcribe(
str(audio_path),
word_timestamps=True,
condition_on_previous_text=True
)
return
"text": result["text"],
"segments": result["segments"],
"language": result["language"]
Sentiment Analysis with Advanced Transformers
Once the audio has been transcribed into text, the next critical step is to understand the underlying sentiment and emotions expressed by the customer. The project leverages Hugging Face Transformers, a powerful library for natural language processing, utilizing CardiffNLP’s twitter-roberta-base-sentiment-latest model. This RoBERTa-based model, fine-tuned on social media text, is particularly adept at understanding the nuances of conversational language, making it ideal for analyzing customer calls.
Sentiment analysis classifies text into discrete categories: positive, neutral, or negative. Unlike simpler keyword-matching methods, Transformer models like RoBERTa excel by understanding context, sarcasm, and idiomatic expressions. The process involves tokenizing the input text, feeding it through the Transformer network, and applying a softmax activation function to the final layer. This function outputs probabilities for each sentiment class, which sum to 1. For instance, a probability distribution of 0.85 for positive, 0.10 for neutral, and 0.05 for negative clearly indicates an overall positive sentiment.

The SentimentAnalyzer class implements this functionality:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch.nn.functional as F
class SentimentAnalyzer:
def __init__(self):
model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
def analyze(self, text):
inputs = self.tokenizer(text, return_tensors="pt", truncation=True)
outputs = self.model(**inputs)
probabilities = F.softmax(outputs.logits, dim=1)
labels = ["negative", "neutral", "positive"]
scores = label: float(prob) for label, prob in zip(labels, probabilities[0])
return
"label": max(scores, key=scores.get),
"scores": scores,
"compound": scores["positive"] - scores["negative"]
The compound score, ranging from -1 (highly negative) to +1 (highly positive), provides a convenient metric for tracking sentiment shifts over time and across different customer interactions. This is a significant improvement over lexicon-based methods like VADER, which can falter when encountering complex linguistic structures or nuanced expressions. For example, a sentence like "I’m not unhappy with the service" would be correctly interpreted as positive by a Transformer model, whereas a simple lexicon approach might incorrectly flag it as negative due to the presence of "unhappy."
Topic Extraction with BERTopic
Beyond understanding sentiment, identifying the core topics of discussion in customer calls is crucial for strategic decision-making. BERTopic is an innovative library that automatically discovers latent themes within a corpus of text without requiring pre-defined topic categories. This unsupervised approach is highly beneficial for uncovering emergent issues or trends that might not be anticipated.
BERTopic works by leveraging sentence embeddings generated by Transformer models. These embeddings capture the semantic meaning of sentences, allowing the model to group similar statements together. It then applies dimensionality reduction techniques (like UMAP) to visualize these embeddings and clustering algorithms (like HDBSCAN) to identify dense regions, which correspond to distinct topics. Finally, a class-based TF-IDF (c-TF-IDF) algorithm is used to extract the most representative keywords for each identified topic.
This method offers a significant advantage over older techniques such as Latent Dirichlet Allocation (LDA), which primarily relies on word co-occurrence. BERTopic’s semantic understanding ensures that phrases with similar meanings, like "shipping delay" and "late delivery," are correctly associated with the same topic.

The TopicExtractor class in the project handles topic modeling:
from bertopic import BERTopic
class TopicExtractor:
def __init__(self):
self.model = BERTopic(
embedding_model="all-MiniLM-L6-v2",
min_topic_size=2,
verbose=True
)
def extract_topics(self, documents):
topics, probabilities = self.model.fit_transform(documents)
topic_info = self.model.get_topic_info()
topic_keywords =
topic_id: self.model.get_topic(topic_id)[:5]
for topic_id in set(topics) if topic_id != -1
return
"assignments": topics,
"keywords": topic_keywords,
"distribution": topic_info
It is important to note that topic extraction is most effective when applied to a collection of documents (at least 5-10 calls are recommended) to discern meaningful patterns. Individual calls can then be analyzed using the fitted model. The output provides topic assignments for each document, representative keywords for each topic, and a distribution of topics, as visualized in Figure 5.
Interactive Dashboard with Streamlit
To make the extracted insights accessible and actionable for business users, the project incorporates a Streamlit-based dashboard. Streamlit transforms Python scripts into interactive web applications with minimal coding effort. The dashboard offers a user-friendly interface for uploading audio files, visualizing sentiment trends, mapping emotion distributions, and exploring topic clusters.
The app.py script orchestrates the dashboard’s functionality. Key features include:
- File Upload: Users can upload multiple audio files (MP3 or WAV format) for analysis.
- Sentiment Gauge: A visual representation of the overall sentiment distribution across analyzed calls.
- Emotion Radar Chart: Displays the prevalence of different detected emotions during customer interactions.
- Topic Distribution: Bar charts illustrating the frequency of identified topics, such as billing issues or technical support queries.
- Detailed Call Analysis: Individual call transcripts and sentiment scores are presented upon request.
The core Streamlit application structure is as follows:

import streamlit as st
def main():
st.title("Customer Sentiment Analyzer")
uploaded_files = st.file_uploader(
"Upload Audio Files",
type=["mp3", "wav"],
accept_multiple_files=True
)
if uploaded_files and st.button("Analyze"):
with st.spinner("Processing..."):
results = pipeline.process_batch(uploaded_files)
# Display results
col1, col2 = st.columns(2)
with col1:
st.plotly_chart(create_sentiment_gauge(results))
with col2:
st.plotly_chart(create_emotion_radar(results))
Performance Optimization with Caching
Streamlit’s nature of re-running the entire script on every user interaction necessitates efficient model loading. The @st.cache_resource decorator is employed to ensure that computationally intensive models are loaded only once and persist across user sessions, significantly improving the dashboard’s responsiveness.
@st.cache_resource
def load_models():
return CallProcessor()
processor = load_models()
When a user uploads a file, a spinner indicates that processing is underway, followed by the immediate display of results, enhancing the user experience.
if uploaded_file:
with st.spinner("Transcribing and analyzing..."):
result = processor.process_file(uploaded_file)
st.success("Done!")
st.write(result["text"])
st.metric("Sentiment", result["sentiment"]["label"])
The dashboard’s visualizations, powered by Plotly, are interactive, allowing users to hover for details, zoom into specific timeframes, and toggle data series, thereby transforming raw analytics into actionable business intelligence.
Practical Lessons and Application Execution
The project offers several practical insights into modern AI applications. Whisper’s mel spectrogram processing highlights how AI can mimic human auditory perception for robust audio analysis. The choice between softmax and sigmoid activation functions in neural networks is critical; softmax is ideal for multi-class classification problems (like sentiment analysis with mutually exclusive categories), while sigmoid is used for binary classification or multi-label classification where multiple outputs can be true simultaneously.

To run the application locally, users follow the initial setup steps. The system offers several modes of operation:
- Text Analysis: A command-line interface allows testing the NLP models with sample text.
- Single Audio Analysis:
python main.py --audio path/to/call.mp3 - Batch Processing:
python main.py --batch data/audio/ - Interactive Dashboard:
python main.py --dashboardThis launches the web application, accessible at
http://localhost:8501in a web browser, providing the full interactive experience. Figure 8 shows an example of successful analysis output in the terminal, including sentiment scores.
Conclusion
The developed system represents a complete, offline-capable solution for transcribing customer calls, analyzing sentiment and emotions, and extracting recurring topics using exclusively open-source tools. This robust, production-ready foundation can significantly enhance customer service operations by providing actionable insights into customer satisfaction, identifying areas for improvement in products and services, and enabling proactive issue resolution. The local processing model ensures data privacy and eliminates recurring API costs, making it an economically viable and secure solution for businesses of all sizes. The complete code is publicly available on GitHub, inviting developers and businesses to explore and adapt this powerful AI-driven approach to customer interaction analysis.
The project’s creator, Shittu Olumide, a software engineer and technical writer, demonstrates a commitment to leveraging cutting-edge technologies for practical applications and clear communication. His work on this project exemplifies the potential of open-source AI to democratize advanced analytical capabilities.

