Skip to content

Database

The backend uses PostgreSQL via SQLAlchemy with Flask-Migrate (Alembic) for schema management. In tests, an in-memory SQLite database is used.

The database contains 12 tables organized around users, simulations, tasks, and results.

UserModel (polymorphic)
├── YaptideUserModel
└── KeycloakUserModel
UserModel ──< SimulationModel (polymorphic)
├── CelerySimulationModel
└── BatchSimulationModel
SimulationModel ──< TaskModel (polymorphic)
├── CeleryTaskModel
└── BatchTaskModel
SimulationModel ──< InputModel
SimulationModel ──< EstimatorModel ──< PageModel
SimulationModel ──< LogfilesModel
ClusterModel ──< BatchSimulationModel

User — base user table with polymorphic inheritance on auth_provider.

ColumnTypeDescription
idInteger (PK)Auto-incremented user ID
usernameStringUnique username
auth_providerString"yaptide" or "keycloak" (discriminator)

YaptideUser — native auth users.

ColumnTypeDescription
idInteger (FK → User)
password_hashStringWerkzeug PBKDF2 hash

KeycloakUser — Keycloak/PLGrid users.

ColumnTypeDescription
idInteger (FK → User)
certTextSSH certificate (PEM)
private_keyTextSSH private key (PEM)

Simulation — base simulation table with polymorphic inheritance on platform.

ColumnTypeDescription
idInteger (PK)Auto-incremented
job_idStringUUID, used as the external identifier
user_idInteger (FK → User)Owner
titleStringSimulation title
platformString"DIRECT" or "BATCH" (discriminator)
input_typeString"editor" or "files"
sim_typeString"shieldhit", "fluka", "topas"
job_stateStringCurrent state (see lifecycle)
start_timeDateTimeJob submission time
end_timeDateTimeJob completion time
update_key_hashStringHashed JWT for worker auth

CelerySimulation — direct (Celery) simulations.

ColumnTypeDescription
idInteger (FK → Simulation)
merge_idStringCelery task ID for the merge step

BatchSimulation — batch (Slurm) simulations.

ColumnTypeDescription
idInteger (FK → Simulation)
cluster_idInteger (FK → Cluster)Target HPC cluster
job_dirStringRemote working directory on cluster
array_idStringSlurm array job ID
collect_idStringSlurm collect job ID

Task — individual simulation tasks within a job.

ColumnTypeDescription
idInteger (PK)
task_idInteger0-based task index
simulation_idInteger (FK → Simulation)
task_stateStringTask state
requested_primariesBigIntegerTarget number of primaries
simulated_primariesBigIntegerCompleted primaries
estimated_timeIntegerEstimated remaining seconds
start_timeDateTime
end_timeDateTime

CeleryTask — Celery-specific task data.

ColumnTypeDescription
idInteger (FK → Task)
celery_idStringCelery task UUID

BatchTask — batch-specific task data (minimal, inherits from Task).

Input — stores simulation input files.

ColumnTypeDescription
idInteger (PK)
simulation_idInteger (FK → Simulation)
compressed_dataLargeBinarygzip-compressed JSON (file dict)

Estimator — named result containers.

ColumnTypeDescription
idInteger (PK)
simulation_idInteger (FK → Simulation)
nameStringEstimator name
file_nameStringOriginal filename
compressed_dataLargeBinarygzip-compressed metadata

Page — individual scoring dimensions within an estimator.

ColumnTypeDescription
idInteger (PK)
estimator_idInteger (FK → Estimator)
page_numberInteger0-based page index
page_nameStringDescriptive name
page_dimensionIntegerNumber of dimensions
compressed_dataLargeBinarygzip-compressed result data (axes, values)

Logfiles — simulation log output.

ColumnTypeDescription
idInteger (PK)
simulation_idInteger (FK → Simulation)
compressed_dataLargeBinarygzip-compressed log text

Cluster — registered HPC clusters.

ColumnTypeDescription
idInteger (PK)
cluster_nameStringe.g., "ares", "prometheus"

All large binary fields use gzip compression:

from yaptide.persistence.models import compress, decompress
# Store
model.compressed_data = compress(json.dumps(data).encode())
# Retrieve
data = json.loads(decompress(model.compressed_data).decode())

This reduces storage for large simulation results (estimator pages can be several MB uncompressed).

The project uses Flask-Migrate (Alembic) for schema changes.

Terminal window
# After modifying models.py
poetry run flask --app yaptide.application db migrate -m "Add new column"
# Review the generated migration in migrations/versions/
# Then apply:
poetry run flask --app yaptide.application db upgrade
  1. Modify persistence/models.py
  2. Generate migration: flask db migrate -m "description"
  3. Review the auto-generated migration script
  4. Test locally: flask db upgrade
  5. Commit the migration file alongside the model changes

Always back up the database before running migrations in production.

Terminal window
# Backup
docker compose exec postgresql pg_dump -U yaptide yaptide > backup.sql
# Apply migration
docker compose exec yaptide_flask flask --app yaptide.application db upgrade

To test a migration against a copy of the production database:

  1. Dump production: pg_dump -U yaptide yaptide > prod_backup.sql
  2. Load into a test database: psql -U yaptide test_db < prod_backup.sql
  3. Point FLASK_SQLALCHEMY_DATABASE_URI at the test DB
  4. Run flask db upgrade
  5. Verify the application works correctly

When running the database in Docker, access it with pgAdmin or psql:

Terminal window
# Direct psql access
docker compose exec postgresql psql -U yaptide yaptide
# Or with the develop compose file (includes pgAdmin on port 9999)
docker compose -f docker-compose.yml -f docker-compose-develop.yml up -d
# Open http://localhost:9999, login with admin@admin.com / admin