-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Current behaviour
Situation 1: Quotas are well checked during workflow creation time
Launch some workflows as the admin user:
reana-dev run-example -c r-d-r-roofit -w serialAfter their successful execution, Set some ridiculously low disk quota for the user:
kubectl exec -i -t deployment/reana-server -- flask reana-admin quota-set -e [email protected] -r disk -l 1 --admin-access-token $REANA_ACCESS_TOKENCheck that the user is indeed over quota:
$ reana-client quota-show --resource disk -h
7.51 MiB out of 1 Bytes used (787660800%)Then try to launch a new workflow, and see that you cannot:
$ reana-dev run-example -c r-d-r-roofit -w serial
...
==> ERROR: Cannot create workflow root6-roofit-serial-kubernetes:
User quota exceeded.
Resource: disk, usage: 7.51 MiB out of 1 Bytes used (787660800%)So far so good.
Situation 2: Quotas are not checked during workflow execution time
Revert back the user disk quota to a large value:
$ kubectl exec -i -t deployment/reana-server -- flask reana-admin quota-set -e [email protected] -r disk -l 1000000000 --admin-access-token $REANA_ACCESS_TOKEN
$ reana-client quota-show --resource disk -h
7.51 MiB out of 953.67 MiB used (1%)Let's submit 20 workflows into the queue:
cd reana-demo-root6-roofit
reana-benchmark submit -w test -n 1-20Let's start them:
reana-benchmark start -w test -n 1-20Whilst some of the workflows are running, some pending, and some in the queue still, let's put back the disk quota to a ridiculously low value:
kubectl exec -i -t deployment/reana-server -- flask reana-admin quota-set -e [email protected] -r disk -l 1 --admin-access-token $REANA_ACCESS_TOKENThe expectation is that the later workflows that are still in the queue should not be allowed for the execution, because the user is out of disk quota. (The "running" and "pending" should be allowed, since they were already accepted for execution. A small overhead in the disk quota usage that this causes is tolerated.)
However, even the "queued" workflows are still nicely taken for execution, and all the workflows finish fine:
$ rcg list --filter name=test-
NAME RUN_NUMBER CREATED STARTED ENDED STATUS
test-20 1 2025-04-03T09:14:47 2025-04-03T09:19:02 2025-04-03T09:19:17 finished
test-19 1 2025-04-03T09:14:47 2025-04-03T09:18:58 2025-04-03T09:19:13 finished
test-18 1 2025-04-03T09:14:47 2025-04-03T09:18:54 2025-04-03T09:19:09 finished
test-17 1 2025-04-03T09:14:47 2025-04-03T09:18:50 2025-04-03T09:19:05 finished
test-16 1 2025-04-03T09:14:47 2025-04-03T09:18:46 2025-04-03T09:19:01 finished
test-15 1 2025-04-03T09:14:46 2025-04-03T09:18:42 2025-04-03T09:18:57 finished
test-14 1 2025-04-03T09:14:46 2025-04-03T09:18:38 2025-04-03T09:18:53 finished
test-13 1 2025-04-03T09:14:46 2025-04-03T09:18:34 2025-04-03T09:18:49 finished
test-12 1 2025-04-03T09:14:46 2025-04-03T09:18:30 2025-04-03T09:18:45 finished
test-11 1 2025-04-03T09:14:46 2025-04-03T09:18:26 2025-04-03T09:18:41 finished
test-10 1 2025-04-03T09:14:45 2025-04-03T09:18:21 2025-04-03T09:18:37 finished
test-9 1 2025-04-03T09:14:45 2025-04-03T09:18:17 2025-04-03T09:18:33 finished
test-8 1 2025-04-03T09:14:45 2025-04-03T09:18:13 2025-04-03T09:18:29 finished
test-7 1 2025-04-03T09:14:45 2025-04-03T09:18:09 2025-04-03T09:18:25 finished
test-6 1 2025-04-03T09:14:45 2025-04-03T09:18:05 2025-04-03T09:18:21 finished
test-5 1 2025-04-03T09:14:44 2025-04-03T09:18:01 2025-04-03T09:18:16 finished
test-4 1 2025-04-03T09:14:44 2025-04-03T09:17:57 2025-04-03T09:18:12 finished
test-3 1 2025-04-03T09:14:44 2025-04-03T09:17:53 2025-04-03T09:18:08 finished
test-2 1 2025-04-03T09:14:44 2025-04-03T09:17:49 2025-04-03T09:18:04 finished
test-1 1 2025-04-03T09:14:44 2025-04-03T09:17:45 2025-04-03T09:18:00 finishedThis may lead to a serious over-quota situation when a user submits many thousands of workflows.
Expected behaviour
The CPU and Disk quota checking should be done not only at the workflow submission time, but also when they are taken from the queue and processed by the scheduler for execution. (Perhaps even for rescheduling.)