.. _qrh: Quick Reference Handbook ======================== .. important:: Follow checklists in order. Check each box. Stop at **→ IT Support** if a step fails. **Support email**: inventory-support@university.edu | **Server path**: ``/data/LUStores`` | **Compose file**: ``docker-compose.prod.yml`` .. contents:: :local: :depth: 1 ---- Error Code Lookup ----------------- Find your error code or message, then jump to the named procedure. .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Code / Message - Likely cause - Go to * - ``ERR-001`` / Connection refused - Service not running - :ref:`qrh-service-down` * - ``ERR-002`` / ECONNREFUSED 5432 - Database not running - :ref:`qrh-db-down` * - ``ERR-003`` / No space left on device - Disk full - :ref:`qrh-disk-full` * - ``ERR-004`` / certificate has expired - TLS cert expired - :ref:`qrh-ssl` * - ``ERR-005`` / Invalid credentials / 401 - Auth misconfigured - :ref:`qrh-auth` * - ``ERR-006`` / JWT_SECRET not set - Missing env var - :ref:`qrh-env` * - ``ERR-007`` / Migrations skipped - Bad migrations path - :ref:`qrh-migrations` * - ``ERR-008`` / Feature not available (402) - Licence not loaded - :ref:`qrh-licence` * - ``ERR-009`` / Cannot have negative stock - Float drift (fixed v1.1) - Update to latest ``deploy`` branch * - ``ERR-010`` / Health check failing / Restarting - Container crash loop - :ref:`qrh-crash-loop` * - ``ERR-011`` / Out of memory / OOMKilled - Insufficient RAM - :ref:`qrh-memory` * - ``ERR-012`` / Permission denied (pg data) - Wrong file ownership - ``sudo chown -R 999:999 /db`` * - ``ERR-013`` / getaddrinfo EAI_AGAIN db - DB container not ready yet - Wait 30 s → :ref:`qrh-service-down` * - ``ERR-014`` / SAML metadata invalid - IdP config mismatch - :ref:`qrh-saml` * - ``ERR-015`` / Helm release stuck - Rancher deploy stalled - :ref:`qrh-helm` ---- .. _qrh-service-down: PROC-01 Service Not Responding -------------------------------- **Symptoms**: Site unreachable, "Connection refused", container shows Exit or Restarting. .. code-block:: bash cd /data/LUStores docker compose -f docker-compose.prod.yml ps # identify which service ☐ All services **Up**? → check firewall:: sudo ufw status # ports 80 and 443 must be ALLOW ☐ **nginx** down → :ref:`qrh-nginx` ☐ **app** down → :ref:`qrh-app` ☐ **db** down → :ref:`qrh-db-down` ☐ None of the above → restart everything:: docker compose -f docker-compose.prod.yml restart # wait 60 s, then test ☐ Still down → **→ IT Support** ---- .. _qrh-nginx: PROC-02 Nginx Down -------------------- .. code-block:: bash docker compose -f docker-compose.prod.yml logs --tail=50 nginx ☐ ``address already in use`` → another process holds port 80/443:: sudo lsof -i :80 -i :443 # find the PID sudo kill # stop it if safe to do so ☐ ``ssl certificate not found`` → :ref:`qrh-ssl` ☐ Other error → note exact message, then:: docker compose -f docker-compose.prod.yml restart nginx ☐ Still down → **→ IT Support** with log output ---- .. _qrh-app: PROC-03 Application Service Down ---------------------------------- .. code-block:: bash docker compose -f docker-compose.prod.yml logs --tail=100 app ☐ ``ECONNREFUSED 5432`` → DB not ready; wait 30 s then:: docker compose -f docker-compose.prod.yml restart app ☐ ``JWT_SECRET`` / ``SESSION_SECRET`` / ``DB_PASSWORD`` missing → :ref:`qrh-env` ☐ ``migrations skipped`` or path error → :ref:`qrh-migrations` ☐ ``Feature not available`` 402 → :ref:`qrh-licence` ☐ Verify health endpoint:: curl http://localhost:5000/health # expected: {"status":"healthy"} ☐ Still unhealthy after restart → **→ IT Support** ---- .. _qrh-db-down: PROC-04 Database Down ----------------------- .. danger:: If logs contain **"corrupted"** or **"PANIC"** — stop and **→ IT Support immediately**. Do not attempt a restart. .. code-block:: bash docker compose -f docker-compose.prod.yml logs --tail=100 db ☐ ``no space left`` → :ref:`qrh-disk-full` ☐ ``permission denied`` on data directory:: sudo chown -R 999:999 /db docker compose -f docker-compose.prod.yml restart db ☐ ``password authentication failed`` → passwords mismatch in ``.env.prod``:: grep DB_PASSWORD .env.prod grep POSTGRES_PASSWORD .env.prod # must be identical — fix .env.prod then restart db + app ☐ Verify DB accepting connections:: docker compose -f docker-compose.prod.yml exec db \ pg_isready -U postgres # expected: "accepting connections" ☐ Still not accepting → **→ IT Support** ---- .. _qrh-ssl: PROC-05 SSL Certificate Issue -------------------------------- **Symptoms**: Browser shows "Certificate expired" or ``ERR_CERT_DATE_INVALID`` .. code-block:: bash docker compose -f docker-compose.prod.yml exec certbot \ certbot certificates # check expiry date ☐ Expired — renew:: docker compose -f docker-compose.prod.yml exec certbot \ certbot renew --force-renewal docker compose -f docker-compose.prod.yml exec nginx nginx -s reload ☐ Renewal fails — **rate limited** → wait 7 days; verify ``DOMAIN`` in ``.env.prod`` is the correct public hostname ☐ Renewal fails — **port 80 blocked**:: sudo ufw allow 80/tcp ☐ Certificate missing entirely:: docker compose -f docker-compose.prod.yml run --rm certbot \ certonly --webroot --webroot-path=/var/www/certbot \ -d ${DOMAIN} --email ${EMAIL} --agree-tos --non-interactive ☐ Still failing → **→ IT Support** ---- .. _qrh-disk-full: PROC-06 Disk Full ------------------- **Symptoms**: ``ERR-003``, DB write errors, container exits immediately .. code-block:: bash df -h # confirm which partition is full ☐ ``/`` (root) full — clean Docker artefacts:: docker system prune -a --volumes # type y find /data/LUStores/logs -name "*.log" -mtime +30 -delete ☐ ``/db`` full — clean old DB logs, then vacuum:: docker compose -f docker-compose.prod.yml exec db \ find /var/lib/postgresql/data/pg_log -name "*.log" -mtime +7 -delete docker compose -f docker-compose.prod.yml exec db \ psql -U postgres -d university_inventory -c "VACUUM FULL;" # Note: VACUUM FULL locks tables — run during off-hours ☐ Trim old backups (keep last 10):: ls -t /data/LUStores/backups | tail -n +11 | \ xargs -I{} rm /data/LUStores/backups/{} ☐ Check space again: ``df -h`` ☐ Still full → **→ IT Support** (disk expansion required) ---- .. _qrh-auth: PROC-07 Login Not Working --------------------------- **Symptoms**: "Invalid credentials", 401 errors, redirect loop ☐ Try the known admin account first (rule out single-user issue) ☐ Verify users exist:: docker compose -f docker-compose.prod.yml exec db \ psql -U postgres -d university_inventory \ -c "SELECT id, email, role FROM users LIMIT 5;" ☐ No users → see :doc:`/admin/first-admin-setup` ☐ SAML / SSO errors → :ref:`qrh-saml` ☐ ``JWT_SECRET`` missing or recently changed → :ref:`qrh-env` (changing the secret invalidates all sessions) ☐ Restart auth stack:: docker compose -f docker-compose.prod.yml restart replit-auth app docker compose -f docker-compose.prod.yml logs --tail=50 replit-auth ☐ Still failing → **→ IT Support** ---- .. _qrh-saml: PROC-08 SAML / SSO Not Working --------------------------------- **Symptoms**: ``ERR-014``, redirect back to login, "SAML metadata invalid" ☐ Verify IdP metadata URL is reachable:: curl -I ${SAML_IDP_METADATA_URL} # must return 200 ☐ Confirm ``SAML_SP_ENTITY_ID`` in ``.env.prod`` exactly matches the IdP registration ☐ Check SP certificate expiry:: openssl x509 -in saml/sp.crt -noout -enddate ☐ Re-sync IdP metadata:: docker compose -f docker-compose.prod.yml restart app ☐ Enable local auth as emergency fallback:: # .env.prod LOCAL_AUTH_ENABLED=true # then restart app ☐ Still failing → contact your IdP administrator with SP entity ID and ACS URL (``https:///auth/saml/callback``) ---- .. _qrh-env: PROC-09 Missing / Wrong Environment Variables ------------------------------------------------ **Symptoms**: ``ERR-006``, app refuses to start, blank secrets ☐ Confirm ``.env.prod`` exists:: ls -la /data/LUStores/.env.prod ☐ Check required vars are non-blank:: grep -E "^(SESSION_SECRET|JWT_SECRET|DB_PASSWORD|DOMAIN)=" .env.prod ☐ Helm/Rancher — verify the Kubernetes secret:: kubectl get secret lustores-secrets -n lustores -o jsonpath='{.data}' ☐ Regenerate if compromised:: openssl rand -hex 64 # paste into SESSION_SECRET openssl rand -hex 64 # paste into JWT_SECRET # edit .env.prod then restart app .. warning:: Changing ``JWT_SECRET`` or ``SESSION_SECRET`` logs out all active users. ---- .. _qrh-migrations: PROC-10 Database Migrations Not Running ----------------------------------------- **Symptoms**: ``ERR-007``, ``column X does not exist`` errors in logs ☐ Check startup logs for migration output:: docker compose -f docker-compose.prod.yml logs app | grep -E "migration|Drizzle|✅|ℹ️" ☐ ``migrations skipped — path error`` → fixed in ``deploy`` ≥ v1.1:: git pull origin deploy docker compose -f docker-compose.prod.yml up -d --build app ☐ Apply missed migrations manually:: docker compose -f docker-compose.prod.yml exec app \ node -e "require('./dist/dbInit.js').initializeDatabase()" ☐ Verify the ``currentStock`` column is ``numeric``:: docker compose -f docker-compose.prod.yml exec db \ psql -U postgres -d university_inventory \ -c "\d items" | grep currentStock # expected: currentStock | numeric(10,2) ☐ Still failing → **→ IT Support** with column name and error message ---- .. _qrh-licence: PROC-11 Licence / Feature Not Available ----------------------------------------- **Symptoms**: ``ERR-008``, 402 responses on Analytics, Notifications, or Import pages ☐ Check licence status in app: **Settings → Licence** ☐ Confirm ``LICENCE_KEY`` is set in ``.env.prod``:: grep LICENCE_KEY .env.prod ☐ Key present but features locked → cache may not have warmed (fixed in ``deploy`` ≥ v1.1); restart:: docker compose -f docker-compose.prod.yml restart app ☐ Key expired → paste renewed token in **Settings → Licence → Save**, or update ``.env.prod`` and restart ---- .. _qrh-crash-loop: PROC-12 Container Crash Loop ------------------------------ **Symptoms**: ``ERR-010``, status shows "Restarting", never reaches "Up" .. code-block:: bash docker compose -f docker-compose.prod.yml logs --tail=50 ☐ ``Out of memory`` → :ref:`qrh-memory` ☐ ``Port already in use``:: sudo lsof -i : # find conflicting PID sudo kill ☐ Dependency not ready → start in order:: docker compose -f docker-compose.prod.yml stop docker compose -f docker-compose.prod.yml up -d db sleep 30 docker compose -f docker-compose.prod.yml up -d ☐ Watch until stable:: watch -n 3 'docker compose -f docker-compose.prod.yml ps' # Ctrl+C to exit ☐ Still crashing after 3 attempts → **→ IT Support** with full logs ---- .. _qrh-memory: PROC-13 Out of Memory ----------------------- **Symptoms**: ``ERR-011``, containers killed randomly, system very slow .. code-block:: bash free -h docker stats --no-stream # look for containers near their limit ☐ Restart memory-heavy services:: docker compose -f docker-compose.prod.yml restart app redis ☐ Release page cache:: sync && echo 3 | sudo tee /proc/sys/vm/drop_caches ☐ Add temporary swap if none exists:: sudo fallocate -l 2G /swapfile sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile ☐ If memory keeps growing → restart all services overnight:: docker compose -f docker-compose.prod.yml restart ☐ Repeated OOM → **→ IT Support** (server needs more RAM) ---- .. _qrh-helm: PROC-14 Helm / Rancher Deploy Stalled ---------------------------------------- **Symptoms**: ``ERR-015``, deployment stuck in "Deploying" for > 10 minutes .. code-block:: bash kubectl get pods -n lustores kubectl describe pod -n lustores | tail -30 # check Events section ☐ ``ImagePullBackOff`` → Docker Hub credentials missing:: kubectl get secret regcred -n lustores # if missing, see deployment/docker-hub-setup.md ☐ ``Pending`` → check node resources:: kubectl describe nodes | grep -A 5 "Allocated resources" ☐ ``CrashLoopBackOff`` → :ref:`qrh-crash-loop` (same steps, use ``kubectl logs`` instead) ☐ Force re-deploy via Rancher UI: **Apps → lustores → ⋮ → Upgrade → Force Update** ☐ Roll back:: helm rollback lustores -n lustores ☐ Still stuck → **→ IT Support** ---- Backup Procedures ----------------- .. _qrh-backup: Emergency Backup ~~~~~~~~~~~~~~~~ Run before any major change or maintenance window: .. code-block:: bash cd /data/LUStores docker compose -f docker-compose.prod.yml exec -T db \ pg_dump -U postgres university_inventory \ | gzip > "backups/emergency_$(date +%Y%m%d_%H%M%S).sql.gz" ls -lh backups/ | tail -1 # verify: file must be > 1 MB .. _qrh-restore: Emergency Restore ~~~~~~~~~~~~~~~~~ .. danger:: This **overwrites all current data**. Confirm before proceeding. .. code-block:: bash ls -lh backups/ # choose backup filename docker compose -f docker-compose.prod.yml stop app gunzip -c backups/.sql.gz | \ docker compose -f docker-compose.prod.yml exec -T db \ psql -U postgres -d university_inventory docker compose -f docker-compose.prod.yml start app # wait 60 s → test login → verify data ---- Preventive Maintenance ----------------------- **Weekly** (5 minutes): .. code-block:: bash df -h # disk space OK? docker compose -f docker-compose.prod.yml ps # all Up? ls -lh backups/ | tail -3 # backups recent? docker compose -f docker-compose.prod.yml logs --tail=50 app | grep -i error **Monthly** (30 minutes): .. code-block:: bash docker compose -f docker-compose.prod.yml exec certbot certbot certificates # SSL expiry? docker compose -f docker-compose.prod.yml pull # new image versions? sudo apt update && sudo apt list --upgradable # OS patches? ---- Escalation — What to Send IT Support -------------------------------------- Collect these three files and attach them to your support email: .. code-block:: bash docker compose -f docker-compose.prod.yml logs --tail=300 > ~/lustores-logs.txt docker compose -f docker-compose.prod.yml ps > ~/lustores-status.txt df -h && free -h > ~/lustores-disk.txt Include in your message: 1. **Error code** (e.g. ``ERR-002``) or exact error text 2. **Procedure attempted** (e.g. ``PROC-04``) 3. The three output files above