Marketplace Job Listings Data Cleaner

This tool processes raw, messy scraped job data from Amazon’s careers pages. It’s for Python developers and data analysts who need stable, structured datasets for analysis. It handles common scraping inconsistencies like duplicate entries, broken HTML, and varying date formats.

Features

Deduplicate listings by job ID and title — removes exact and fuzzy duplicates
Standardize date formats — converts various string formats to ISO 8601
Clean HTML artifacts — strips tags and normalizes whitespace from description fields
Validate and structure location data — parses city, state, country into separate columns
Export to clean CSV or JSON — outputs a consistent, analysis-ready file

Usage

amazon_job_cleaner --input messy_listings.csv --output clean_listings.json

Requirements

Python 3.8+. Install dependencies:

pip install -r requirements.txt

Download

Buy for $29 →

Buy once, download immediately. ZIP includes the full script, README, and usage examples.

License

Personal & Commercial Use. You may use this tool in your own personal and commercial projects. Redistribution or resale of the source code is not permitted.