What’s in This Dataset

This dataset contains 10,000 realistic, HIPAA-safe synthetic patient records in standard CSV format. Each record includes a full set of healthcare-related fields such as patient demographics (age, gender, address), ICD-10 diagnosis codes, prescribed medications, insurance provider details, and billing amounts. The dataset is designed to mimic real-world patient data without compromising privacy, making it suitable for testing and development purposes.

The CSV includes columns like patient_id, age, gender, diagnosis_code, medication_name, insurance_provider, billing_amount, and admission_date. All identifiable information has been removed or altered to ensure compliance with HIPAA regulations. The structure is clean and consistent, ideal for data science workflows or healthcare app development.

Who Needs This Data

Developers building healthcare applications, data scientists training machine learning models, and QA testers require synthetic datasets that replicate real-world complexity without legal or ethical risks. Healthcare app developers often need sample patient data to validate features such as billing systems or diagnostic tools. Data scientists use similar datasets to train predictive models for disease outcomes or medication effectiveness. QA teams rely on realistic data to ensure their systems perform correctly under various conditions, especially in regulated industries.

Use Cases

  • Testing a patient management system before going live with real data
  • Training a machine learning model to predict diagnosis based on medication history
  • Validating a billing dashboard for accuracy across different insurance providers
  • Developing a mobile app that displays patient summaries and medication lists
  • Building a healthcare analytics platform that visualizes trends in diagnosis codes
  • Preparing a database seed for a new Electronic Health Record (EHR) system

Loading It in Python

If you’re working with this dataset in Python, loading it into a pandas DataFrame is straightforward. You can read the CSV directly using pandas and inspect its contents quickly.

import pandas as pd
df = pd.read_csv('10,000_synthetic_patient_records_(hipaa-safe).csv')
print(df.head())
print(f"Shape: {df.shape}")
print(df.dtypes)

This will show the first five rows, the total number of records and columns, and the data types of each column. You’ll see a structured table with patient records ready for analysis or model training.

Get the Dataset

Download 10,000 Synthetic Patient Records (HIPAA-safe) →

$39 one-time. Instant download. CSV format, ready to use.

More datasets and Python tools at OddShop