===================================
Data Migration Script Guide
===================================
.. note::
This guide documents the Python-based ``data_migration_script.py`` for migrating
from the legacy PHP Physics Stores system (MySQL) to the modern LUStores system (PostgreSQL).
Overview
========
The ``data_migration_script.py`` is a specialized migration tool designed to handle the specific migration from:
- **Source**: MySQL-based ``physicsstores`` database (Legacy PHP system)
- **Target**: PostgreSQL-based ``university_inventory`` database (Modern LUStores)
Script Location
---------------
::
/data/LUStores/scripts/data_migration_script.py
Features
========
✅ **Automated Schema Migration**
- Automatically applies any pending SQL migrations from ``/migrations/`` directory
- Tracks applied migrations to avoid duplicates
✅ **Data Preservation**
- Preserves existing admin users during migration
- Creates fallback "Legacy System" user for data integrity
- Maintains referential integrity across all tables
✅ **Comprehensive Logging**
- Detailed migration logs saved to ``data_migration.log``
- Real-time console output with progress indicators
- Error tracking and statistics
✅ **Data Transformation**
- Maps legacy MySQL schema to modern PostgreSQL schema
- Handles password hashing with bcrypt
- Converts data types and formats automatically
Prerequisites
=============
System Requirements
-------------------
**On the migration machine** (can be local or VM):
- Python 3.8+
- Required Python packages::
pip install psycopg2-binary pymysql bcrypt
**Access Requirements**:
- SSH/network access to legacy MySQL database
- SSH/network access to target PostgreSQL database
- Read permissions on legacy database
- Write permissions on target database
Schema Export File
------------------
The script requires a schema mapping file::
/data/LUStores/migrations/database_schemas_export_20250911_152208.json
This JSON file contains the schema structure of both source and target databases.
Migration Process
=================
Data Flow Diagram
-----------------
.. mermaid::
graph LR
subgraph "Legacy System"
MYSQL[(MySQL
physicsstores)]
TABLES["Tables:
• users
• stock
• supplier
• orders
• issues
• charge"]
end
subgraph "Migration Script"
SCRIPT[data_migration_script.py]
TRANSFORM[Data Transformation
& Mapping]
VALIDATE[Validation &
Error Handling]
end
subgraph "Modern System"
POSTGRES[(PostgreSQL
university_inventory)]
NEWTABLES["Tables:
• users
• items
• suppliers
• orders + order_items
• sales + sale_items
• chargecodes"]
end
MYSQL --> TABLES
TABLES --> SCRIPT
SCRIPT --> TRANSFORM
TRANSFORM --> VALIDATE
VALIDATE --> NEWTABLES
NEWTABLES --> POSTGRES
style MYSQL fill:#ff6b6b
style POSTGRES fill:#6bcf7f
style SCRIPT fill:#4d96ff
Table Mappings
==============
The script performs the following table migrations:
.. list-table:: Migration Mapping
:header-rows: 1
:widths: 25 25 50
* - Legacy Table
- Target Table(s)
- Notes
* - ``users``
- ``users``
- Role mapping: LEVEL ≥10→admin, ≥5→manager, else→user
* - ``stock``
- ``items``
- All items assigned to "General" category
* - ``supplier``
- ``suppliers``
- Direct mapping with address concatenation
* - ``charge``
- ``chargecodes``
- COSTCENTRE becomes primary code
* - ``orders``
- ``orders`` + ``order_items``
- Denormalized→normalized structure
* - ``issues``
- ``sales`` + ``sale_items``
- Stock issues converted to sales records
Field Transformations
=====================
Users Table
-----------
.. code-block:: text
Legacy (MySQL) Target (PostgreSQL)
────────────── ───────────────────
USERNAME → email (with @physics.lancs.ac.uk if missing @)
USERPASSWORD → password_hash (bcrypt hashed)
USERNAME → first_name
'User' → last_name
LEVEL (10+/5+/other) → role (admin/manager/user)
(always) → is_active = true
(always) → must_change_password = true
Items Table
-----------
.. code-block:: text
Legacy (MySQL) Target (PostgreSQL)
────────────── ───────────────────
DESC1 → name
CODE → sku
DESC1 + DESC2 → description
(all items) → category_id = "General" category
PRICE → price
0.20 → vat_rate (20% VAT)
false → vat_included
BALANCE → current_stock
MIN → minimum_stock
HIDDEN != 'Y' → is_active
Suppliers Table
---------------
.. code-block:: text
Legacy (MySQL) Target (PostgreSQL)
────────────── ───────────────────
CODE → id
NAME → name
(none) → contact (null)
(none) → email (null)
TELEPHONE → phone
ADDRESS1-4 (concatenated) → address
Orders & Order Items
--------------------
.. code-block:: text
Legacy (MySQL) Target (PostgreSQL)
────────────── ───────────────────
Orders (denormalized):
ORDER_NO → orders.order_id
SUPPLIER → orders.supplier_id
DATE → orders.status (if set: completed, else: pending)
SUM(VALUE) → orders.total_amount
Order Items (created from order rows):
STOCK_CODE → order_items.item_sku
PRICE → order_items.unit_cost
QTY → order_items.quantity
VALUE → order_items.total_cost
RECD_DATE != null → order_items.received = true
Issues → Sales Migration
------------------------
.. code-block:: text
Legacy (MySQL) Target (PostgreSQL)
────────────── ───────────────────
Sales (created from issues):
STOCK_CODE + DATE + counter → sales.sale_id (LEGACY-CODE-DATE-NNNNNN)
'GENERAL' → sales.charge_code
(calculated) → sales.subtotal_amount
(calculated at 20% VAT) → sales.vat_amount
(total inc VAT) → sales.total_amount
'completed' → sales.status
Customer Info (JSON):
ISSUEDTO → customer_info.name
REASON → customer_info.reason
COSTCENTRE, ACTIVITY, etc. → customer_info.legacy_data
Usage Guide
===========
Basic Usage
-----------
Command-line syntax::
python scripts/data_migration_script.py \
--schema-file migrations/database_schemas_export_20250911_152208.json \
--source-host legacy-db.university.edu \
--source-port 3306 \
--source-user readonly_user \
--source-password 'legacy_password' \
--source-database physicsstores \
--target-host localhost \
--target-port 5432 \
--target-user postgres \
--target-password 'postgres_password' \
--target-database university_inventory
Step-by-Step Migration
-----------------------
**Step 1: Prepare Backup**
Before running migration, backup the target database::
docker compose -f docker-compose.prod.yml exec -T db \
pg_dump -U postgres university_inventory \
| gzip > "backups/pre_migration_$(date +%Y%m%d_%H%M%S).sql.gz"
**Step 2: Verify Access**
Test connections to both databases::
# Test MySQL connection
mysql -h legacy-db.university.edu -u readonly_user -p physicsstores \
-e "SELECT COUNT(*) FROM stock;"
# Test PostgreSQL connection
docker compose -f docker-compose.prod.yml exec db \
psql -U postgres -d university_inventory -c "SELECT version();"
**Step 3: Run Migration**
Execute the migration script::
cd /data/LUStores
python scripts/data_migration_script.py \
--schema-file migrations/database_schemas_export_20250911_152208.json \
--source-host YOUR_LEGACY_HOST \
--source-user YOUR_LEGACY_USER \
--source-password 'YOUR_LEGACY_PASSWORD' \
--source-database physicsstores \
--target-host localhost \
--target-port 5432 \
--target-user postgres \
--target-password 'YOUR_POSTGRES_PASSWORD' \
--target-database university_inventory
**Step 4: Monitor Progress**
The script provides real-time output::
[2025-01-22 14:30:15] - INFO - Starting complete data migration...
[2025-01-22 14:30:15] - INFO - Source: physicsstores (mysql)
[2025-01-22 14:30:15] - INFO - Target: university_inventory (postgres)
[2025-01-22 14:30:16] - INFO - Step 0: Verifying and applying database migrations...
[2025-01-22 14:30:18] - INFO - Step 1: Clearing target database...
[2025-01-22 14:30:20] - INFO - Step 3: Migrating categories...
[2025-01-22 14:30:21] - INFO - Step 4: Migrating suppliers...
[2025-01-22 14:30:45] - INFO - Migrated supplier: Acme Corp -> ID ACME001
[2025-01-22 14:31:15] - INFO - Step 5: Migrating items...
[2025-01-22 14:32:30] - INFO - Migrated item: LAB-001 -> ID 1523
...
[2025-01-22 14:45:00] - INFO - Migration completed successfully!
**Step 5: Review Log File**
Check the detailed log::
tail -100 data_migration.log
**Step 6: Verify Migration**
Run verification queries:
.. code-block:: sql
-- Check migrated counts
SELECT 'users' as table, COUNT(*) as count FROM users
UNION ALL
SELECT 'items', COUNT(*) FROM items
UNION ALL
SELECT 'suppliers', COUNT(*) FROM suppliers
UNION ALL
SELECT 'orders', COUNT(*) FROM orders
UNION ALL
SELECT 'chargecodes', COUNT(*) FROM chargecodes
UNION ALL
SELECT 'sales', COUNT(*) FROM sales;
Migration Sequence
==================
The script follows this exact order (critical for foreign key dependencies):
.. mermaid::
graph TD
START([Start Migration])
STEP0[Step 0: Apply SQL Migrations]
STEP1[Step 1: Clear Target Database
Preserve Admin Users]
STEP2[Step 2: Skip User Migration
Create Legacy System User]
STEP3[Step 3: Migrate Categories
Create default categories]
STEP4[Step 4: Migrate Suppliers]
STEP5[Step 5: Migrate Items]
STEP6[Step 6: Migrate Chargecodes]
STEP7[Step 7: Migrate Orders & Order Items]
STEP8[Step 8: Migrate Issues as Sales]
STEP9[Step 9: Restore Admin Users]
SUMMARY[Print Migration Summary]
DONE([Migration Complete])
START --> STEP0
STEP0 --> STEP1
STEP1 --> STEP2
STEP2 --> STEP3
STEP3 --> STEP4
STEP4 --> STEP5
STEP5 --> STEP6
STEP6 --> STEP7
STEP7 --> STEP8
STEP8 --> STEP9
STEP9 --> SUMMARY
SUMMARY --> DONE
style START fill:#4d96ff
style STEP1 fill:#ffd93d
style STEP5 fill:#ffd93d
style DONE fill:#6bcf7f
Migration Statistics
====================
After completion, the script prints a summary::
============================================================
MIGRATION SUMMARY
============================================================
users : 0 migrated, 0 errors, 5 skipped
categories : 6 migrated, 0 errors, 0 skipped
suppliers : 45 migrated, 0 errors, 0 skipped
items : 1523 migrated, 0 errors, 12 skipped
chargecodes : 89 migrated, 0 errors, 3 skipped
orders : 234 migrated, 0 errors, 1 skipped
order_items : 567 migrated, 0 errors, 2 skipped
stock_movements : 3421 migrated, 0 errors, 15 skipped
------------------------------------------------------------
TOTAL : 5885 migrated, 0 errors
Success rate: 99.5%
============================================================
Understanding the Output
------------------------
- **migrated**: Successfully transferred to target database
- **skipped**: Skipped due to duplicates or missing required fields
- **errors**: Failed during migration (should be 0 for successful migration)
Troubleshooting
===============
Common Errors
-------------
Connection Refused
~~~~~~~~~~~~~~~~~~
**Error**::
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server...")
**Solutions**:
1. Verify source database host is accessible::
ping legacy-db.university.edu
2. Check firewall allows MySQL port 3306
3. Verify credentials are correct
4. Check if MySQL allows remote connections
Authentication Failed
~~~~~~~~~~~~~~~~~~~~~
**Error**::
psycopg2.OperationalError: FATAL: password authentication failed
**Solutions**:
1. Verify PostgreSQL password in ``.env.prod`` matches command
2. Check PostgreSQL ``pg_hba.conf`` allows connections
3. Ensure user has sufficient privileges
Duplicate Key Violations
~~~~~~~~~~~~~~~~~~~~~~~~~
**Error**::
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint
**Solutions**:
1. The script should have cleared the database first (Step 1)
2. If running multiple times, ensure database is cleared between runs
3. Check for pre-existing data that shouldn't be there
Out of Memory
~~~~~~~~~~~~~
**Error**::
MemoryError: Unable to allocate array
**Solutions**:
1. Increase available RAM
2. Process large tables in batches (modify script)
3. Close other applications during migration
Schema File Not Found
~~~~~~~~~~~~~~~~~~~~~~
**Error**::
FileNotFoundError: [Errno 2] No such file or directory: 'migrations/database_schemas_export...'
**Solutions**:
1. Ensure you're running from ``/data/LUStores`` directory
2. Verify schema file exists::
ls -la migrations/database_schemas_export_*.json
3. Use absolute path if needed
Partial Migration Recovery
===========================
If migration fails partway through:
**Option 1: Restore and Retry** (Recommended)
::
# Stop application
docker compose -f docker-compose.prod.yml stop app
# Restore backup
gunzip -c backups/pre_migration_TIMESTAMP.sql.gz | \
docker compose -f docker-compose.prod.yml exec -T db \
psql -U postgres -d university_inventory
# Fix the issue, then retry migration
**Option 2: Continue from Failure Point**
The script tracks which tables succeeded. You can manually complete remaining tables if needed (advanced users only).
Post-Migration Tasks
====================
1. **Verify Data Integrity**
Run validation queries to ensure data is correct
2. **Reset Admin Password**
::
docker compose -f docker-compose.prod.yml exec app \
npm run admin:reset-password admin@university.edu
3. **Test Application**
- Log in with admin account
- Verify items are visible
- Test search functionality
- Create a test order
4. **Create Post-Migration Backup**
::
docker compose -f docker-compose.prod.yml exec -T db \
pg_dump -U postgres university_inventory \
| gzip > "backups/post_migration_$(date +%Y%m%d_%H%M%S).sql.gz"
5. **Archive Legacy System**
Once verified successful:
- Take final backup of legacy system
- Document legacy system shutdown date
- Preserve access for auditing purposes
Best Practices
==============
Before Migration
----------------
- ☑ **Test on staging first** - Never run directly on production
- ☑ **Backup everything** - Both source and target databases
- ☑ **Schedule downtime** - Plan for 2-4 hours depending on data size
- ☑ **Verify schema file** - Ensure it matches both databases
- ☑ **Check disk space** - Ensure sufficient space for migration
- ☑ **Notify users** - Inform them of the migration window
During Migration
----------------
- ☑ **Monitor logs** - Watch for errors in real-time
- ☑ **Don't interrupt** - Let the script complete fully
- ☑ **Check resource usage** - Monitor CPU, memory, disk I/O
- ☑ **Save migration log** - Keep ``data_migration.log`` for records
After Migration
---------------
- ☑ **Verify counts** - Ensure record counts match expected
- ☑ **Test functionality** - Test all critical features
- ☑ **Review skipped records** - Investigate why records were skipped
- ☑ **Document issues** - Note any problems encountered
- ☑ **Train users** - Ensure team knows about any changes
Performance Considerations
==========================
For large datasets (>100,000 records):
- Migration may take several hours
- Consider running during off-hours
- Monitor database server performance
- Ensure adequate disk space for transaction logs
- Consider increasing PostgreSQL ``shared_buffers`` temporarily
Estimated Times
---------------
.. list-table:: Migration Time Estimates
:header-rows: 1
:widths: 30 35 35
* - Records
- Estimated Time
- Recommended RAM
* - < 10,000
- 5-15 minutes
- 2GB
* - 10,000 - 50,000
- 15-45 minutes
- 4GB
* - 50,000 - 100,000
- 45-90 minutes
- 8GB
* - > 100,000
- 2-4 hours
- 16GB+
Related Documentation
=====================
- :doc:`legacy-migration` - General migration guide
- :doc:`/operations/backup-restore` - Backup and restore procedures
- :doc:`/reference/database-schema` - Target database schema reference
- :doc:`/deployment/production` - Production deployment guide