Quick Reference Handbook
Important
Follow checklists in order. Check each box. Stop at → IT Support if a step fails.
Support email: inventory-support@university.edu |
Server path: /data/LUStores |
Compose file: docker-compose.prod.yml
Error Code Lookup
Find your error code or message, then jump to the named procedure.
Code / Message |
Likely cause |
Go to |
|---|---|---|
|
Service not running |
|
|
Database not running |
|
|
Disk full |
|
|
TLS cert expired |
|
|
Auth misconfigured |
|
|
Missing env var |
|
|
Bad migrations path |
|
|
Licence not loaded |
|
|
Float drift (fixed v1.1) |
Update to latest |
|
Container crash loop |
|
|
Insufficient RAM |
|
|
Wrong file ownership |
|
|
DB container not ready yet |
Wait 30 s → PROC-01 Service Not Responding |
|
IdP config mismatch |
|
|
Rancher deploy stalled |
PROC-01 Service Not Responding
Symptoms: Site unreachable, “Connection refused”, container shows Exit or Restarting.
cd /data/LUStores
docker compose -f docker-compose.prod.yml ps # identify which service
☐ All services Up? → check firewall:
sudo ufw status # ports 80 and 443 must be ALLOW
☐ nginx down → PROC-02 Nginx Down
☐ app down → PROC-03 Application Service Down
☐ db down → PROC-04 Database Down
☐ None of the above → restart everything:
docker compose -f docker-compose.prod.yml restart
# wait 60 s, then test
☐ Still down → → IT Support
PROC-02 Nginx Down
docker compose -f docker-compose.prod.yml logs --tail=50 nginx
☐ address already in use → another process holds port 80/443:
sudo lsof -i :80 -i :443 # find the PID
sudo kill <PID> # stop it if safe to do so
☐ ssl certificate not found → PROC-05 SSL Certificate Issue
☐ Other error → note exact message, then:
docker compose -f docker-compose.prod.yml restart nginx
☐ Still down → → IT Support with log output
PROC-03 Application Service Down
docker compose -f docker-compose.prod.yml logs --tail=100 app
☐ ECONNREFUSED 5432 → DB not ready; wait 30 s then:
docker compose -f docker-compose.prod.yml restart app
☐ JWT_SECRET / SESSION_SECRET / DB_PASSWORD missing → PROC-09 Missing / Wrong Environment Variables
☐ migrations skipped or path error → PROC-10 Database Migrations Not Running
☐ Feature not available 402 → PROC-11 Licence / Feature Not Available
☐ Verify health endpoint:
curl http://localhost:5000/health
# expected: {"status":"healthy"}
☐ Still unhealthy after restart → → IT Support
PROC-04 Database Down
Danger
If logs contain “corrupted” or “PANIC” — stop and → IT Support immediately. Do not attempt a restart.
docker compose -f docker-compose.prod.yml logs --tail=100 db
☐ no space left → PROC-06 Disk Full
☐ permission denied on data directory:
sudo chown -R 999:999 /db
docker compose -f docker-compose.prod.yml restart db
☐ password authentication failed → passwords mismatch in .env.prod:
grep DB_PASSWORD .env.prod
grep POSTGRES_PASSWORD .env.prod
# must be identical — fix .env.prod then restart db + app
☐ Verify DB accepting connections:
docker compose -f docker-compose.prod.yml exec db \
pg_isready -U postgres
# expected: "accepting connections"
☐ Still not accepting → → IT Support
PROC-05 SSL Certificate Issue
Symptoms: Browser shows “Certificate expired” or ERR_CERT_DATE_INVALID
docker compose -f docker-compose.prod.yml exec certbot \
certbot certificates # check expiry date
☐ Expired — renew:
docker compose -f docker-compose.prod.yml exec certbot \
certbot renew --force-renewal
docker compose -f docker-compose.prod.yml exec nginx nginx -s reload
- ☐ Renewal fails — rate limited → wait 7 days; verify
DOMAINin .env.prodis the correct public hostname
☐ Renewal fails — port 80 blocked:
sudo ufw allow 80/tcp
☐ Certificate missing entirely:
docker compose -f docker-compose.prod.yml run --rm certbot \
certonly --webroot --webroot-path=/var/www/certbot \
-d ${DOMAIN} --email ${EMAIL} --agree-tos --non-interactive
☐ Still failing → → IT Support
PROC-06 Disk Full
Symptoms: ERR-003, DB write errors, container exits immediately
df -h # confirm which partition is full
☐ / (root) full — clean Docker artefacts:
docker system prune -a --volumes # type y
find /data/LUStores/logs -name "*.log" -mtime +30 -delete
☐ /db full — clean old DB logs, then vacuum:
docker compose -f docker-compose.prod.yml exec db \
find /var/lib/postgresql/data/pg_log -name "*.log" -mtime +7 -delete
docker compose -f docker-compose.prod.yml exec db \
psql -U postgres -d university_inventory -c "VACUUM FULL;"
# Note: VACUUM FULL locks tables — run during off-hours
☐ Trim old backups (keep last 10):
ls -t /data/LUStores/backups | tail -n +11 | \
xargs -I{} rm /data/LUStores/backups/{}
☐ Check space again: df -h
☐ Still full → → IT Support (disk expansion required)
PROC-07 Login Not Working
Symptoms: “Invalid credentials”, 401 errors, redirect loop
☐ Try the known admin account first (rule out single-user issue)
☐ Verify users exist:
docker compose -f docker-compose.prod.yml exec db \
psql -U postgres -d university_inventory \
-c "SELECT id, email, role FROM users LIMIT 5;"
☐ No users → see First Admin Account Setup
☐ SAML / SSO errors → PROC-08 SAML / SSO Not Working
- ☐
JWT_SECRETmissing or recently changed → PROC-09 Missing / Wrong Environment Variables (changing the secret invalidates all sessions)
☐ Restart auth stack:
docker compose -f docker-compose.prod.yml restart replit-auth app
docker compose -f docker-compose.prod.yml logs --tail=50 replit-auth
☐ Still failing → → IT Support
PROC-08 SAML / SSO Not Working
Symptoms: ERR-014, redirect back to login, “SAML metadata invalid”
☐ Verify IdP metadata URL is reachable:
curl -I ${SAML_IDP_METADATA_URL} # must return 200
☐ Confirm SAML_SP_ENTITY_ID in .env.prod exactly matches the IdP registration
☐ Check SP certificate expiry:
openssl x509 -in saml/sp.crt -noout -enddate
☐ Re-sync IdP metadata:
docker compose -f docker-compose.prod.yml restart app
☐ Enable local auth as emergency fallback:
# .env.prod
LOCAL_AUTH_ENABLED=true
# then restart app
- ☐ Still failing → contact your IdP administrator with SP entity ID and
ACS URL (
https://<DOMAIN>/auth/saml/callback)
PROC-09 Missing / Wrong Environment Variables
Symptoms: ERR-006, app refuses to start, blank secrets
☐ Confirm .env.prod exists:
ls -la /data/LUStores/.env.prod
☐ Check required vars are non-blank:
grep -E "^(SESSION_SECRET|JWT_SECRET|DB_PASSWORD|DOMAIN)=" .env.prod
☐ Helm/Rancher — verify the Kubernetes secret:
kubectl get secret lustores-secrets -n lustores -o jsonpath='{.data}'
☐ Regenerate if compromised:
openssl rand -hex 64 # paste into SESSION_SECRET
openssl rand -hex 64 # paste into JWT_SECRET
# edit .env.prod then restart app
Warning
Changing JWT_SECRET or SESSION_SECRET logs out all active users.
PROC-10 Database Migrations Not Running
Symptoms: ERR-007, column X does not exist errors in logs
☐ Check startup logs for migration output:
docker compose -f docker-compose.prod.yml logs app | grep -E "migration|Drizzle|✅|ℹ️"
☐ migrations skipped — path error → fixed in deploy ≥ v1.1:
git pull origin deploy
docker compose -f docker-compose.prod.yml up -d --build app
☐ Apply missed migrations manually:
docker compose -f docker-compose.prod.yml exec app \
node -e "require('./dist/dbInit.js').initializeDatabase()"
☐ Verify the currentStock column is numeric:
docker compose -f docker-compose.prod.yml exec db \
psql -U postgres -d university_inventory \
-c "\d items" | grep currentStock
# expected: currentStock | numeric(10,2)
☐ Still failing → → IT Support with column name and error message
PROC-11 Licence / Feature Not Available
Symptoms: ERR-008, 402 responses on Analytics, Notifications, or Import pages
☐ Check licence status in app: Settings → Licence
☐ Confirm LICENCE_KEY is set in .env.prod:
grep LICENCE_KEY .env.prod
- ☐ Key present but features locked → cache may not have warmed
(fixed in
deploy≥ v1.1); restart:docker compose -f docker-compose.prod.yml restart app
- ☐ Key expired → paste renewed token in Settings → Licence → Save, or
update
.env.prodand restart
PROC-12 Container Crash Loop
Symptoms: ERR-010, status shows “Restarting”, never reaches “Up”
docker compose -f docker-compose.prod.yml logs --tail=50 <service>
☐ Out of memory → PROC-13 Out of Memory
☐ Port already in use:
sudo lsof -i :<port> # find conflicting PID
sudo kill <PID>
☐ Dependency not ready → start in order:
docker compose -f docker-compose.prod.yml stop
docker compose -f docker-compose.prod.yml up -d db
sleep 30
docker compose -f docker-compose.prod.yml up -d
☐ Watch until stable:
watch -n 3 'docker compose -f docker-compose.prod.yml ps' # Ctrl+C to exit
☐ Still crashing after 3 attempts → → IT Support with full logs
PROC-13 Out of Memory
Symptoms: ERR-011, containers killed randomly, system very slow
free -h
docker stats --no-stream # look for containers near their limit
☐ Restart memory-heavy services:
docker compose -f docker-compose.prod.yml restart app redis
☐ Release page cache:
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
☐ Add temporary swap if none exists:
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile
☐ If memory keeps growing → restart all services overnight:
docker compose -f docker-compose.prod.yml restart
☐ Repeated OOM → → IT Support (server needs more RAM)
PROC-14 Helm / Rancher Deploy Stalled
Symptoms: ERR-015, deployment stuck in “Deploying” for > 10 minutes
kubectl get pods -n lustores
kubectl describe pod <pod> -n lustores | tail -30 # check Events section
☐ ImagePullBackOff → Docker Hub credentials missing:
kubectl get secret regcred -n lustores # if missing, see deployment/docker-hub-setup.md
☐ Pending → check node resources:
kubectl describe nodes | grep -A 5 "Allocated resources"
☐ CrashLoopBackOff → PROC-12 Container Crash Loop (same steps, use kubectl logs instead)
- ☐ Force re-deploy via Rancher UI:
Apps → lustores → ⋮ → Upgrade → Force Update
☐ Roll back:
helm rollback lustores -n lustores
☐ Still stuck → → IT Support
Backup Procedures
Emergency Backup
Run before any major change or maintenance window:
cd /data/LUStores
docker compose -f docker-compose.prod.yml exec -T db \
pg_dump -U postgres university_inventory \
| gzip > "backups/emergency_$(date +%Y%m%d_%H%M%S).sql.gz"
ls -lh backups/ | tail -1 # verify: file must be > 1 MB
Emergency Restore
Danger
This overwrites all current data. Confirm before proceeding.
ls -lh backups/ # choose backup filename
docker compose -f docker-compose.prod.yml stop app
gunzip -c backups/<FILENAME>.sql.gz | \
docker compose -f docker-compose.prod.yml exec -T db \
psql -U postgres -d university_inventory
docker compose -f docker-compose.prod.yml start app
# wait 60 s → test login → verify data
Preventive Maintenance
Weekly (5 minutes):
df -h # disk space OK?
docker compose -f docker-compose.prod.yml ps # all Up?
ls -lh backups/ | tail -3 # backups recent?
docker compose -f docker-compose.prod.yml logs --tail=50 app | grep -i error
Monthly (30 minutes):
docker compose -f docker-compose.prod.yml exec certbot certbot certificates # SSL expiry?
docker compose -f docker-compose.prod.yml pull # new image versions?
sudo apt update && sudo apt list --upgradable # OS patches?
Escalation — What to Send IT Support
Collect these three files and attach them to your support email:
docker compose -f docker-compose.prod.yml logs --tail=300 > ~/lustores-logs.txt
docker compose -f docker-compose.prod.yml ps > ~/lustores-status.txt
df -h && free -h > ~/lustores-disk.txt
Include in your message:
Error code (e.g.
ERR-002) or exact error textProcedure attempted (e.g.
PROC-04)The three output files above