Skip to content

E-commerce product scraping & multilingual sentiment analysis on Bangladeshi e-commerce reviews using Selenium, Transformers & TextBlob.

Notifications You must be signed in to change notification settings

Az-main/Data-Wrangling-Project

Repository files navigation

🛒 E-Commerce Product Scraping & Sentiment Analysis

A Data Wrangling course project that scrapes product data and customer reviews from Bangladeshi e-commerce platforms (Daraz & Pickaboo), then performs multilingual sentiment analysis (Bangla, Banglish, English) on customer reviews.


📌 Project Overview

Phase 1: Web Scraping (Data Collection)

Automated scraping of product details and customer reviews using Selenium WebDriver.

Phase 2: Sentiment Analysis

Multilingual sentiment classification using Multilingual BERT and TextBlob to determine customer satisfaction polarity.


📁 Project Structure

├── scraping.py                  # Daraz smartphone listing scraper
├── product_links_collector.py   # Daraz earbuds product link collector
├── p1.py                        # Daraz TV product link collector
├── s1.py                        # Daraz TV product detail + review scraper
├── scraper.py                   # Pickaboo product detail scraper
├── reviews.py                   # Daraz review-only scraper
├── Scrap/
│   ├── daraz_scraper.py         # Daraz earbuds full scraper (with review pagination)
│   ├── earbuds_links.csv        # Collected earbuds product links
│   └── earbuds_final_continued.csv
├── Daraz/
│   └── thikase_daraz_links.csv  # Phone case product links
├── Sentimental Analysis/
│   ├── analysis.ipynb           # Sentiment analysis notebook
│   ├── for_model.csv            # Prepared data for model input
│   └── reviews_with_sentiment.csv  # Final output with sentiment labels
├── daraz_smartphones_page1.csv  # Scraped smartphone data
├── earbuds_daraz_products.csv   # Scraped earbuds data
├── mobile_accessories_pickaboo.csv  # Scraped Pickaboo accessories data
├── TV_daraz_products.csv        # Scraped TV product data
├── TV_links.csv                 # Collected TV product links
└── README.md

🛠️ Tech Stack

Category Tools
Language Python
Web Scraping Selenium WebDriver, ChromeDriver
Data Handling Pandas, CSV
Sentiment Analysis HuggingFace Transformers, TextBlob, NLTK
ML Models nlptown/bert-base-multilingual-uncased-sentiment, sagorsarker/bangla-bert-base
Translation deep-translator, indic-transliteration

📊 Data Collected

Category Platform Products File
Smartphones Daraz Page 1 listings daraz_smartphones_page1.csv
TVs Daraz Full details + reviews TV_daraz_products.csv
Earbuds Daraz Full details + reviews earbuds_daraz_products.csv
Mobile Accessories Pickaboo Full details + reviews mobile_accessories_pickaboo.csv
Phone Cases Daraz Links collected Daraz/thikase_daraz_links.csv

Fields Scraped per Product

  • Product name, brand, discounted price, listed price, discount %
  • Warranty info, seller name, availability
  • Average rating, rating count, image URL
  • Product description, customer reviews

🧠 Sentiment Analysis

The analysis notebook (Sentimental Analysis/analysis.ipynb) implements:

  1. Language Detection — Identifies Bangla, Banglish, or English reviews
  2. English Reviews — Analyzed using TextBlob polarity scoring
  3. Bangla/Banglish Reviews — Analyzed using Multilingual BERT (nlptown/bert-base-multilingual-uncased-sentiment)
  4. Star-to-Polarity Mapping — Converts 1–5 star predictions to -1.0 to +1.0 polarity scale
  5. Batch Processing — Processes entire CSV files and computes average polarity per product

Sentiment Output

Column Description
original_comment Raw review text
sentiment Positive / Neutral / Negative
polarity Score from -1.0 (most negative) to +1.0 (most positive)
confidence Model confidence score
avg_polarity_per_row Average polarity per product

🚀 Setup & Usage

Prerequisites

pip install selenium pandas transformers textblob nltk deep-translator indic-transliteration

Running Scrapers

  1. Download ChromeDriver matching your Chrome version
  2. Update the driver path in the scraper scripts
  3. Run any scraper:
    python s1.py                # Scrape TV products
    python scraper.py           # Scrape Pickaboo products
    python Scrap/daraz_scraper.py  # Scrape earbuds

Running Sentiment Analysis

Open Sentimental Analysis/analysis.ipynb in Jupyter Notebook and run the cells.


👥 Contributors

  • Data Wrangling Project — 5th Trimester, Spring 2025

📄 License

This project is for academic purposes only.

About

E-commerce product scraping & multilingual sentiment analysis on Bangladeshi e-commerce reviews using Selenium, Transformers & TextBlob.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published