Current Architecture
This document describes what the code does today for simulation orchestration.
It is based on direct inspection of the current implementations in:
- backend: https://github.com/yaptide/yaptide/tree/master/yaptide
- UI: https://github.com/yaptide/ui/tree/master/src
- existing docs for cross-checking: System Overview and Data Flow
In-scope execution paths:
- Direct execution via Celery workers (
/jobs/direct) - Batch execution via SLURM over SSH (
/jobs/batch)
Not in scope here:
- Geant4 local browser execution internals
- Future architecture proposals (covered by ADRs and design docs)
Runtime Components
Section titled “Runtime Components”| Component | Current role |
|---|---|
UI (ui) | Submits jobs, polls statuses, fetches final results |
Flask API (yaptide/application.py) | Accepts submissions, validates update tokens, stores state/results |
| Celery simulation worker | Runs simulator tasks and merge task for direct jobs |
| Celery helper worker | Submits SLURM jobs and performs helper operations |
| Redis | Celery broker + result backend |
| PostgreSQL | Persistent storage for simulations, tasks, inputs, estimators, pages, logfiles |
| Remote cluster (SLURM) | Executes array jobs and collect jobs for batch mode |
Direct Path (Celery)
Section titled “Direct Path (Celery)”Submission and orchestration
Section titled “Submission and orchestration”Current behavior in code:
- UI submits to
/jobs/direct(run type inRemoteWorkerSimulationService). JobsDirect.postvalidates payload, clampsntasks, creates:CelerySimulationModelCeleryTaskModelrowsInputModel
- Backend creates per-task UUIDs and dispatches workflow through
run_job:group(run_single_simulation.s(...).set(task_id=...))chord(..., chain(set_merging_queued_state, merge_results))
- Merge task id is stored as
simulation.merge_id.
Key files:
yaptide/routes/celery_routes.pyyaptide/celery/utils/manage_tasks.py
Task execution and progress updates
Section titled “Task execution and progress updates”Each run_single_simulation task:
- Writes input files to a temporary working directory.
- Runs simulator subprocess (
shieldhitorfluka). - Starts monitor thread (
read_shieldhit_fileorread_fluka_file) to parse log progress. - Sends task updates to Flask
/taskswithsimulation_id,task_id,update_key, andupdate_dict. - Converts estimator objects to JSON-like dictionaries via
estimators_to_list.
Key files:
yaptide/celery/tasks.pyyaptide/celery/utils/pymc.pyyaptide/celery/utils/requests.pyyaptide/routes/task_routes.py
Merge and persistence
Section titled “Merge and persistence”After all direct tasks complete, merge_results:
- Sets job state to
MERGING_RUNNING. - Averages per-task estimator pages in Python (
average_estimators). - Sends merged payload to Flask
/results. ResultsResource.postupserts estimators/pages into DB and marks simulationCOMPLETED.
Result retrieval path:
- UI fetches from
/results. - Backend serves from DB (
EstimatorModel+PageModel), with optional estimator/page filtering.
Key files:
yaptide/celery/tasks.pyyaptide/routes/common_sim_routes.pyyaptide/persistence/models.py
Batch Path (SLURM)
Section titled “Batch Path (SLURM)”Submission and remote execution
Section titled “Submission and remote execution”JobsBatch.post enqueues submit_job on helper worker.
submit_job currently:
- Opens SSH connection using user certificate/private key.
- Creates remote working directory in
$SCRATCH/yaptide_runs/<timestamp>. - Uploads zipped inputs,
watcher.py, andsimulation_data_sender.py. - Generates scripts from templates and runs
sbatchfor:- array job (N simulations)
- collect job (merge/convert/send)
- Posts
/jobsupdates witharray_id,collect_id, andjob_dir.
Key files:
yaptide/routes/batch_routes.pyyaptide/batch/batch_methods.pyyaptide/batch/shieldhit_string_templates.pyyaptide/batch/fluka_string_templates.py
Batch progress and result delivery
Section titled “Batch progress and result delivery”- Per-task progress:
watcher.pyparses logs and POSTs/tasks. - Final results: collect script runs
convertmc json --manyon output files and then posts/resultsusingsimulation_data_sender.py.
Current State Model
Section titled “Current State Model”Current states in EntityState:
UNKNOWNPENDINGRUNNINGMERGING_QUEUEDMERGING_RUNNINGCANCELEDCOMPLETEDFAILED
State transitions are distributed across:
/jobs/directand/jobs/batchsubmission handlers- monitor updates through
/tasks - merge stage (
set_merging_queued_state,merge_results) /resultspersistence handler
Storage Model (Current)
Section titled “Storage Model (Current)”Current persistence model in PostgreSQL:
Simulation(+CelerySimulation/BatchSimulation)Task(+CeleryTask/BatchTask)Input(gzip-compressed JSON blob)Estimator(gzip-compressed metadata)Page(gzip-compressed page JSON)Logfiles(gzip-compressed JSON)
Compression/decompression is performed in model property getters/setters using JSON serialize + gzip.
UI Behavior Today
Section titled “UI Behavior Today”Current UI orchestration behavior:
- Polls simulation statuses periodically (fast interval while active).
- Fetches full numerical results only when simulation state is
COMPLETED. - Uses
/resultsand estimator/page filtering endpoints after completion.
Key files:
ui/src/services/RemoteWorkerSimulationService.tsui/src/WrapperApp/components/Simulation/SimulationsGrid/SimulationsGridHelpers.tsui/src/WrapperApp/components/Simulation/RecentSimulations.tsx
What This Means For Phase 1+
Section titled “What This Means For Phase 1+”As-is architecture is functional and supports two production paths, but it combines control-plane and data-plane concerns tightly:
- state updates and large result payload transport both flow through the same backend APIs and Celery/Redis stack
- merge is centralized in one Python task
- numerical result delivery is effectively end-of-job, not true incremental partial-result streaming