A Data Wrangling course project that scrapes product data and customer reviews from Bangladeshi e-commerce platforms (Daraz & Pickaboo), then performs multilingual sentiment analysis (Bangla, Banglish, English) on customer reviews.
Automated scraping of product details and customer reviews using Selenium WebDriver.
Multilingual sentiment classification using Multilingual BERT and TextBlob to determine customer satisfaction polarity.
├── scraping.py # Daraz smartphone listing scraper
├── product_links_collector.py # Daraz earbuds product link collector
├── p1.py # Daraz TV product link collector
├── s1.py # Daraz TV product detail + review scraper
├── scraper.py # Pickaboo product detail scraper
├── reviews.py # Daraz review-only scraper
├── Scrap/
│ ├── daraz_scraper.py # Daraz earbuds full scraper (with review pagination)
│ ├── earbuds_links.csv # Collected earbuds product links
│ └── earbuds_final_continued.csv
├── Daraz/
│ └── thikase_daraz_links.csv # Phone case product links
├── Sentimental Analysis/
│ ├── analysis.ipynb # Sentiment analysis notebook
│ ├── for_model.csv # Prepared data for model input
│ └── reviews_with_sentiment.csv # Final output with sentiment labels
├── daraz_smartphones_page1.csv # Scraped smartphone data
├── earbuds_daraz_products.csv # Scraped earbuds data
├── mobile_accessories_pickaboo.csv # Scraped Pickaboo accessories data
├── TV_daraz_products.csv # Scraped TV product data
├── TV_links.csv # Collected TV product links
└── README.md
| Category | Tools |
|---|---|
| Language | Python |
| Web Scraping | Selenium WebDriver, ChromeDriver |
| Data Handling | Pandas, CSV |
| Sentiment Analysis | HuggingFace Transformers, TextBlob, NLTK |
| ML Models | nlptown/bert-base-multilingual-uncased-sentiment, sagorsarker/bangla-bert-base |
| Translation | deep-translator, indic-transliteration |
| Category | Platform | Products | File |
|---|---|---|---|
| Smartphones | Daraz | Page 1 listings | daraz_smartphones_page1.csv |
| TVs | Daraz | Full details + reviews | TV_daraz_products.csv |
| Earbuds | Daraz | Full details + reviews | earbuds_daraz_products.csv |
| Mobile Accessories | Pickaboo | Full details + reviews | mobile_accessories_pickaboo.csv |
| Phone Cases | Daraz | Links collected | Daraz/thikase_daraz_links.csv |
- Product name, brand, discounted price, listed price, discount %
- Warranty info, seller name, availability
- Average rating, rating count, image URL
- Product description, customer reviews
The analysis notebook (Sentimental Analysis/analysis.ipynb) implements:
- Language Detection — Identifies Bangla, Banglish, or English reviews
- English Reviews — Analyzed using TextBlob polarity scoring
- Bangla/Banglish Reviews — Analyzed using Multilingual BERT (
nlptown/bert-base-multilingual-uncased-sentiment) - Star-to-Polarity Mapping — Converts 1–5 star predictions to -1.0 to +1.0 polarity scale
- Batch Processing — Processes entire CSV files and computes average polarity per product
| Column | Description |
|---|---|
original_comment |
Raw review text |
sentiment |
Positive / Neutral / Negative |
polarity |
Score from -1.0 (most negative) to +1.0 (most positive) |
confidence |
Model confidence score |
avg_polarity_per_row |
Average polarity per product |
pip install selenium pandas transformers textblob nltk deep-translator indic-transliteration- Download ChromeDriver matching your Chrome version
- Update the driver path in the scraper scripts
- Run any scraper:
python s1.py # Scrape TV products python scraper.py # Scrape Pickaboo products python Scrap/daraz_scraper.py # Scrape earbuds
Open Sentimental Analysis/analysis.ipynb in Jupyter Notebook and run the cells.
- Data Wrangling Project — 5th Trimester, Spring 2025
This project is for academic purposes only.