.. _qrh:

Quick Reference Handbook
========================

.. important::
   Follow checklists in order. Check each box. Stop at **→ IT Support** if
   a step fails.

   **Support email**: inventory-support@university.edu |
   **Server path**: ``/data/LUStores`` |
   **Compose file**: ``docker-compose.prod.yml``

.. contents::
   :local:
   :depth: 1

----

Error Code Lookup
-----------------

Find your error code or message, then jump to the named procedure.

.. list-table::
   :header-rows: 1
   :widths: 30 30 40

   * - Code / Message
     - Likely cause
     - Go to
   * - ``ERR-001`` / Connection refused
     - Service not running
     - :ref:`qrh-service-down`
   * - ``ERR-002`` / ECONNREFUSED 5432
     - Database not running
     - :ref:`qrh-db-down`
   * - ``ERR-003`` / No space left on device
     - Disk full
     - :ref:`qrh-disk-full`
   * - ``ERR-004`` / certificate has expired
     - TLS cert expired
     - :ref:`qrh-ssl`
   * - ``ERR-005`` / Invalid credentials / 401
     - Auth misconfigured
     - :ref:`qrh-auth`
   * - ``ERR-006`` / JWT_SECRET not set
     - Missing env var
     - :ref:`qrh-env`
   * - ``ERR-007`` / Migrations skipped
     - Bad migrations path
     - :ref:`qrh-migrations`
   * - ``ERR-008`` / Feature not available (402)
     - Licence not loaded
     - :ref:`qrh-licence`
   * - ``ERR-009`` / Cannot have negative stock
     - Float drift (fixed v1.1)
     - Update to latest ``deploy`` branch
   * - ``ERR-010`` / Health check failing / Restarting
     - Container crash loop
     - :ref:`qrh-crash-loop`
   * - ``ERR-011`` / Out of memory / OOMKilled
     - Insufficient RAM
     - :ref:`qrh-memory`
   * - ``ERR-012`` / Permission denied (pg data)
     - Wrong file ownership
     - ``sudo chown -R 999:999 /db``
   * - ``ERR-013`` / getaddrinfo EAI_AGAIN db
     - DB container not ready yet
     - Wait 30 s → :ref:`qrh-service-down`
   * - ``ERR-014`` / SAML metadata invalid
     - IdP config mismatch
     - :ref:`qrh-saml`
   * - ``ERR-015`` / Helm release stuck
     - Rancher deploy stalled
     - :ref:`qrh-helm`

----

.. _qrh-service-down:

PROC-01  Service Not Responding
--------------------------------

**Symptoms**: Site unreachable, "Connection refused", container shows Exit or Restarting.

.. code-block:: bash

   cd /data/LUStores
   docker compose -f docker-compose.prod.yml ps      # identify which service

☐ All services **Up**? → check firewall::

    sudo ufw status   # ports 80 and 443 must be ALLOW

☐ **nginx** down → :ref:`qrh-nginx`

☐ **app** down → :ref:`qrh-app`

☐ **db** down → :ref:`qrh-db-down`

☐ None of the above → restart everything::

    docker compose -f docker-compose.prod.yml restart
    # wait 60 s, then test

☐ Still down → **→ IT Support**

----

.. _qrh-nginx:

PROC-02  Nginx Down
--------------------

.. code-block:: bash

   docker compose -f docker-compose.prod.yml logs --tail=50 nginx

☐ ``address already in use`` → another process holds port 80/443::

    sudo lsof -i :80 -i :443     # find the PID
    sudo kill <PID>               # stop it if safe to do so

☐ ``ssl certificate not found`` → :ref:`qrh-ssl`

☐ Other error → note exact message, then::

    docker compose -f docker-compose.prod.yml restart nginx

☐ Still down → **→ IT Support** with log output

----

.. _qrh-app:

PROC-03  Application Service Down
----------------------------------

.. code-block:: bash

   docker compose -f docker-compose.prod.yml logs --tail=100 app

☐ ``ECONNREFUSED 5432`` → DB not ready; wait 30 s then::

    docker compose -f docker-compose.prod.yml restart app

☐ ``JWT_SECRET`` / ``SESSION_SECRET`` / ``DB_PASSWORD`` missing → :ref:`qrh-env`

☐ ``migrations skipped`` or path error → :ref:`qrh-migrations`

☐ ``Feature not available`` 402 → :ref:`qrh-licence`

☐ Verify health endpoint::

    curl http://localhost:5000/health
    # expected: {"status":"healthy"}

☐ Still unhealthy after restart → **→ IT Support**

----

.. _qrh-db-down:

PROC-04  Database Down
-----------------------

.. danger::
   If logs contain **"corrupted"** or **"PANIC"** — stop and **→ IT Support
   immediately**. Do not attempt a restart.

.. code-block:: bash

   docker compose -f docker-compose.prod.yml logs --tail=100 db

☐ ``no space left`` → :ref:`qrh-disk-full`

☐ ``permission denied`` on data directory::

    sudo chown -R 999:999 /db
    docker compose -f docker-compose.prod.yml restart db

☐ ``password authentication failed`` → passwords mismatch in ``.env.prod``::

    grep DB_PASSWORD .env.prod
    grep POSTGRES_PASSWORD .env.prod
    # must be identical — fix .env.prod then restart db + app

☐ Verify DB accepting connections::

    docker compose -f docker-compose.prod.yml exec db \
        pg_isready -U postgres
    # expected: "accepting connections"

☐ Still not accepting → **→ IT Support**

----

.. _qrh-ssl:

PROC-05  SSL Certificate Issue
--------------------------------

**Symptoms**: Browser shows "Certificate expired" or ``ERR_CERT_DATE_INVALID``

.. code-block:: bash

   docker compose -f docker-compose.prod.yml exec certbot \
       certbot certificates      # check expiry date

☐ Expired — renew::

    docker compose -f docker-compose.prod.yml exec certbot \
        certbot renew --force-renewal
    docker compose -f docker-compose.prod.yml exec nginx nginx -s reload

☐ Renewal fails — **rate limited** → wait 7 days; verify ``DOMAIN`` in
   ``.env.prod`` is the correct public hostname

☐ Renewal fails — **port 80 blocked**::

    sudo ufw allow 80/tcp

☐ Certificate missing entirely::

    docker compose -f docker-compose.prod.yml run --rm certbot \
        certonly --webroot --webroot-path=/var/www/certbot \
        -d ${DOMAIN} --email ${EMAIL} --agree-tos --non-interactive

☐ Still failing → **→ IT Support**

----

.. _qrh-disk-full:

PROC-06  Disk Full
-------------------

**Symptoms**: ``ERR-003``, DB write errors, container exits immediately

.. code-block:: bash

   df -h          # confirm which partition is full

☐ ``/`` (root) full — clean Docker artefacts::

    docker system prune -a --volumes    # type y
    find /data/LUStores/logs -name "*.log" -mtime +30 -delete

☐ ``/db`` full — clean old DB logs, then vacuum::

    docker compose -f docker-compose.prod.yml exec db \
        find /var/lib/postgresql/data/pg_log -name "*.log" -mtime +7 -delete
    docker compose -f docker-compose.prod.yml exec db \
        psql -U postgres -d university_inventory -c "VACUUM FULL;"
    # Note: VACUUM FULL locks tables — run during off-hours

☐ Trim old backups (keep last 10)::

    ls -t /data/LUStores/backups | tail -n +11 | \
        xargs -I{} rm /data/LUStores/backups/{}

☐ Check space again: ``df -h``

☐ Still full → **→ IT Support** (disk expansion required)

----

.. _qrh-auth:

PROC-07  Login Not Working
---------------------------

**Symptoms**: "Invalid credentials", 401 errors, redirect loop

☐ Try the known admin account first (rule out single-user issue)

☐ Verify users exist::

    docker compose -f docker-compose.prod.yml exec db \
        psql -U postgres -d university_inventory \
        -c "SELECT id, email, role FROM users LIMIT 5;"

☐ No users → see :doc:`/admin/first-admin-setup`

☐ SAML / SSO errors → :ref:`qrh-saml`

☐ ``JWT_SECRET`` missing or recently changed → :ref:`qrh-env`
   (changing the secret invalidates all sessions)

☐ Restart auth stack::

    docker compose -f docker-compose.prod.yml restart replit-auth app
    docker compose -f docker-compose.prod.yml logs --tail=50 replit-auth

☐ Still failing → **→ IT Support**

----

.. _qrh-saml:

PROC-08  SAML / SSO Not Working
---------------------------------

**Symptoms**: ``ERR-014``, redirect back to login, "SAML metadata invalid"

☐ Verify IdP metadata URL is reachable::

    curl -I ${SAML_IDP_METADATA_URL}    # must return 200

☐ Confirm ``SAML_SP_ENTITY_ID`` in ``.env.prod`` exactly matches the IdP registration

☐ Check SP certificate expiry::

    openssl x509 -in saml/sp.crt -noout -enddate

☐ Re-sync IdP metadata::

    docker compose -f docker-compose.prod.yml restart app

☐ Enable local auth as emergency fallback::

    # .env.prod
    LOCAL_AUTH_ENABLED=true
    # then restart app

☐ Still failing → contact your IdP administrator with SP entity ID and
   ACS URL (``https://<DOMAIN>/auth/saml/callback``)

----

.. _qrh-env:

PROC-09  Missing / Wrong Environment Variables
------------------------------------------------

**Symptoms**: ``ERR-006``, app refuses to start, blank secrets

☐ Confirm ``.env.prod`` exists::

    ls -la /data/LUStores/.env.prod

☐ Check required vars are non-blank::

    grep -E "^(SESSION_SECRET|JWT_SECRET|DB_PASSWORD|DOMAIN)=" .env.prod

☐ Helm/Rancher — verify the Kubernetes secret::

    kubectl get secret lustores-secrets -n lustores -o jsonpath='{.data}'

☐ Regenerate if compromised::

    openssl rand -hex 64   # paste into SESSION_SECRET
    openssl rand -hex 64   # paste into JWT_SECRET
    # edit .env.prod then restart app

.. warning::
   Changing ``JWT_SECRET`` or ``SESSION_SECRET`` logs out all active users.

----

.. _qrh-migrations:

PROC-10  Database Migrations Not Running
-----------------------------------------

**Symptoms**: ``ERR-007``, ``column X does not exist`` errors in logs

☐ Check startup logs for migration output::

    docker compose -f docker-compose.prod.yml logs app | grep -E "migration|Drizzle|✅|ℹ️"

☐ ``migrations skipped — path error`` → fixed in ``deploy`` ≥ v1.1::

    git pull origin deploy
    docker compose -f docker-compose.prod.yml up -d --build app

☐ Apply missed migrations manually::

    docker compose -f docker-compose.prod.yml exec app \
        node -e "require('./dist/dbInit.js').initializeDatabase()"

☐ Verify the ``currentStock`` column is ``numeric``::

    docker compose -f docker-compose.prod.yml exec db \
        psql -U postgres -d university_inventory \
        -c "\d items" | grep currentStock
    # expected:  currentStock  | numeric(10,2)

☐ Still failing → **→ IT Support** with column name and error message

----

.. _qrh-licence:

PROC-11  Licence / Feature Not Available
-----------------------------------------

**Symptoms**: ``ERR-008``, 402 responses on Analytics, Notifications, or Import pages

☐ Check licence status in app: **Settings → Licence**

☐ Confirm ``LICENCE_KEY`` is set in ``.env.prod``::

    grep LICENCE_KEY .env.prod

☐ Key present but features locked → cache may not have warmed
   (fixed in ``deploy`` ≥ v1.1); restart::

    docker compose -f docker-compose.prod.yml restart app

☐ Key expired → paste renewed token in **Settings → Licence → Save**, or
   update ``.env.prod`` and restart

----

.. _qrh-crash-loop:

PROC-12  Container Crash Loop
------------------------------

**Symptoms**: ``ERR-010``, status shows "Restarting", never reaches "Up"

.. code-block:: bash

   docker compose -f docker-compose.prod.yml logs --tail=50 <service>

☐ ``Out of memory`` → :ref:`qrh-memory`

☐ ``Port already in use``::

    sudo lsof -i :<port>     # find conflicting PID
    sudo kill <PID>

☐ Dependency not ready → start in order::

    docker compose -f docker-compose.prod.yml stop
    docker compose -f docker-compose.prod.yml up -d db
    sleep 30
    docker compose -f docker-compose.prod.yml up -d

☐ Watch until stable::

    watch -n 3 'docker compose -f docker-compose.prod.yml ps'    # Ctrl+C to exit

☐ Still crashing after 3 attempts → **→ IT Support** with full logs

----

.. _qrh-memory:

PROC-13  Out of Memory
-----------------------

**Symptoms**: ``ERR-011``, containers killed randomly, system very slow

.. code-block:: bash

   free -h
   docker stats --no-stream    # look for containers near their limit

☐ Restart memory-heavy services::

    docker compose -f docker-compose.prod.yml restart app redis

☐ Release page cache::

    sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

☐ Add temporary swap if none exists::

    sudo fallocate -l 2G /swapfile
    sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile

☐ If memory keeps growing → restart all services overnight::

    docker compose -f docker-compose.prod.yml restart

☐ Repeated OOM → **→ IT Support** (server needs more RAM)

----

.. _qrh-helm:

PROC-14  Helm / Rancher Deploy Stalled
----------------------------------------

**Symptoms**: ``ERR-015``, deployment stuck in "Deploying" for > 10 minutes

.. code-block:: bash

   kubectl get pods -n lustores
   kubectl describe pod <pod> -n lustores | tail -30    # check Events section

☐ ``ImagePullBackOff`` → Docker Hub credentials missing::

    kubectl get secret regcred -n lustores    # if missing, see deployment/docker-hub-setup.md

☐ ``Pending`` → check node resources::

    kubectl describe nodes | grep -A 5 "Allocated resources"

☐ ``CrashLoopBackOff`` → :ref:`qrh-crash-loop` (same steps, use ``kubectl logs`` instead)

☐ Force re-deploy via Rancher UI:
   **Apps → lustores → ⋮ → Upgrade → Force Update**

☐ Roll back::

    helm rollback lustores -n lustores

☐ Still stuck → **→ IT Support**

----

Backup Procedures
-----------------

.. _qrh-backup:

Emergency Backup
~~~~~~~~~~~~~~~~

Run before any major change or maintenance window:

.. code-block:: bash

   cd /data/LUStores
   docker compose -f docker-compose.prod.yml exec -T db \
       pg_dump -U postgres university_inventory \
       | gzip > "backups/emergency_$(date +%Y%m%d_%H%M%S).sql.gz"
   ls -lh backups/ | tail -1    # verify: file must be > 1 MB

.. _qrh-restore:

Emergency Restore
~~~~~~~~~~~~~~~~~

.. danger::
   This **overwrites all current data**. Confirm before proceeding.

.. code-block:: bash

   ls -lh backups/                           # choose backup filename
   docker compose -f docker-compose.prod.yml stop app
   gunzip -c backups/<FILENAME>.sql.gz | \
       docker compose -f docker-compose.prod.yml exec -T db \
       psql -U postgres -d university_inventory
   docker compose -f docker-compose.prod.yml start app
   # wait 60 s → test login → verify data

----

Preventive Maintenance
-----------------------

**Weekly** (5 minutes):

.. code-block:: bash

   df -h                                                       # disk space OK?
   docker compose -f docker-compose.prod.yml ps               # all Up?
   ls -lh backups/ | tail -3                                  # backups recent?
   docker compose -f docker-compose.prod.yml logs --tail=50 app | grep -i error

**Monthly** (30 minutes):

.. code-block:: bash

   docker compose -f docker-compose.prod.yml exec certbot certbot certificates   # SSL expiry?
   docker compose -f docker-compose.prod.yml pull          # new image versions?
   sudo apt update && sudo apt list --upgradable            # OS patches?

----

Escalation — What to Send IT Support
--------------------------------------

Collect these three files and attach them to your support email:

.. code-block:: bash

   docker compose -f docker-compose.prod.yml logs --tail=300 > ~/lustores-logs.txt
   docker compose -f docker-compose.prod.yml ps              > ~/lustores-status.txt
   df -h && free -h                                          > ~/lustores-disk.txt

Include in your message:

1. **Error code** (e.g. ``ERR-002``) or exact error text
2. **Procedure attempted** (e.g. ``PROC-04``)
3. The three output files above