🛒 E-Commerce Product Scraping & Sentiment Analysis

A Data Wrangling course project that scrapes product data and customer reviews from Bangladeshi e-commerce platforms (Daraz & Pickaboo), then performs multilingual sentiment analysis (Bangla, Banglish, English) on customer reviews.

📌 Project Overview

Phase 1: Web Scraping (Data Collection)

Automated scraping of product details and customer reviews using Selenium WebDriver.

Phase 2: Sentiment Analysis

Multilingual sentiment classification using Multilingual BERT and TextBlob to determine customer satisfaction polarity.

📁 Project Structure

├── scraping.py                  # Daraz smartphone listing scraper
├── product_links_collector.py   # Daraz earbuds product link collector
├── p1.py                        # Daraz TV product link collector
├── s1.py                        # Daraz TV product detail + review scraper
├── scraper.py                   # Pickaboo product detail scraper
├── reviews.py                   # Daraz review-only scraper
├── Scrap/
│   ├── daraz_scraper.py         # Daraz earbuds full scraper (with review pagination)
│   ├── earbuds_links.csv        # Collected earbuds product links
│   └── earbuds_final_continued.csv
├── Daraz/
│   └── thikase_daraz_links.csv  # Phone case product links
├── Sentimental Analysis/
│   ├── analysis.ipynb           # Sentiment analysis notebook
│   ├── for_model.csv            # Prepared data for model input
│   └── reviews_with_sentiment.csv  # Final output with sentiment labels
├── daraz_smartphones_page1.csv  # Scraped smartphone data
├── earbuds_daraz_products.csv   # Scraped earbuds data
├── mobile_accessories_pickaboo.csv  # Scraped Pickaboo accessories data
├── TV_daraz_products.csv        # Scraped TV product data
├── TV_links.csv                 # Collected TV product links
└── README.md

🛠️ Tech Stack

Category	Tools
Language	Python
Web Scraping	Selenium WebDriver, ChromeDriver
Data Handling	Pandas, CSV
Sentiment Analysis	HuggingFace Transformers, TextBlob, NLTK
ML Models	`nlptown/bert-base-multilingual-uncased-sentiment`, `sagorsarker/bangla-bert-base`
Translation	deep-translator, indic-transliteration

📊 Data Collected

Category	Platform	Products	File
Smartphones	Daraz	Page 1 listings	`daraz_smartphones_page1.csv`
TVs	Daraz	Full details + reviews	`TV_daraz_products.csv`
Earbuds	Daraz	Full details + reviews	`earbuds_daraz_products.csv`
Mobile Accessories	Pickaboo	Full details + reviews	`mobile_accessories_pickaboo.csv`
Phone Cases	Daraz	Links collected	`Daraz/thikase_daraz_links.csv`

Fields Scraped per Product

Product name, brand, discounted price, listed price, discount %
Warranty info, seller name, availability
Average rating, rating count, image URL
Product description, customer reviews

🧠 Sentiment Analysis

The analysis notebook (Sentimental Analysis/analysis.ipynb) implements:

Language Detection — Identifies Bangla, Banglish, or English reviews
English Reviews — Analyzed using TextBlob polarity scoring
Bangla/Banglish Reviews — Analyzed using Multilingual BERT (nlptown/bert-base-multilingual-uncased-sentiment)
Star-to-Polarity Mapping — Converts 1–5 star predictions to -1.0 to +1.0 polarity scale
Batch Processing — Processes entire CSV files and computes average polarity per product

Sentiment Output

Column	Description
`original_comment`	Raw review text
`sentiment`	Positive / Neutral / Negative
`polarity`	Score from -1.0 (most negative) to +1.0 (most positive)
`confidence`	Model confidence score
`avg_polarity_per_row`	Average polarity per product

🚀 Setup & Usage

Prerequisites

pip install selenium pandas transformers textblob nltk deep-translator indic-transliteration

Running Scrapers

Download ChromeDriver matching your Chrome version
Update the driver path in the scraper scripts

Run any scraper:

python s1.py                # Scrape TV products
python scraper.py           # Scrape Pickaboo products
python Scrap/daraz_scraper.py  # Scrape earbuds

Running Sentiment Analysis

Open Sentimental Analysis/analysis.ipynb in Jupyter Notebook and run the cells.

👥 Contributors

Data Wrangling Project — 5th Trimester, Spring 2025

📄 License

This project is for academic purposes only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛒 E-Commerce Product Scraping & Sentiment Analysis

📌 Project Overview

Phase 1: Web Scraping (Data Collection)

Phase 2: Sentiment Analysis

📁 Project Structure

🛠️ Tech Stack

📊 Data Collected

Fields Scraped per Product

🧠 Sentiment Analysis

Sentiment Output

🚀 Setup & Usage

Prerequisites

Running Scrapers

Running Sentiment Analysis

👥 Contributors

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Daraz		Daraz
Scrap		Scrap
Sentimental Analysis		Sentimental Analysis
.gitignore		.gitignore
README.md		README.md
TV_daraz_products.csv		TV_daraz_products.csv
TV_links.csv		TV_links.csv
daraz_smartphones_page1.csv		daraz_smartphones_page1.csv
earbuds_daraz_products.csv		earbuds_daraz_products.csv
mobile_accessories_pickaboo.csv		mobile_accessories_pickaboo.csv
p1.py		p1.py
product_links_collector.py		product_links_collector.py
reviews.py		reviews.py
s1.py		s1.py
scraper.py		scraper.py
scraping.py		scraping.py

Az-main/Data-Wrangling-Project

Folders and files

Latest commit

History

Repository files navigation

🛒 E-Commerce Product Scraping & Sentiment Analysis

📌 Project Overview

Phase 1: Web Scraping (Data Collection)

Phase 2: Sentiment Analysis

📁 Project Structure

🛠️ Tech Stack

📊 Data Collected

Fields Scraped per Product

🧠 Sentiment Analysis

Sentiment Output

🚀 Setup & Usage

Prerequisites

Running Scrapers

Running Sentiment Analysis

👥 Contributors

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages