WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

execution: prealably submitted workflows are being taken from the queue for the execution even when disk quota is exceeded #645

@tiborsimko

Description

@tiborsimko

Current behaviour

Situation 1: Quotas are well checked during workflow creation time

Launch some workflows as the admin user:

reana-dev run-example -c r-d-r-roofit -w serial

After their successful execution, Set some ridiculously low disk quota for the user:

kubectl exec -i -t deployment/reana-server -- flask reana-admin quota-set -e [email protected] -r disk -l 1 --admin-access-token $REANA_ACCESS_TOKEN

Check that the user is indeed over quota:

$ reana-client quota-show  --resource disk -h
7.51 MiB out of 1 Bytes used (787660800%)

Then try to launch a new workflow, and see that you cannot:

$ reana-dev run-example -c r-d-r-roofit -w serial
...
==> ERROR: Cannot create workflow root6-roofit-serial-kubernetes:
User quota exceeded.
Resource: disk, usage: 7.51 MiB out of 1 Bytes used (787660800%)

So far so good.

Situation 2: Quotas are not checked during workflow execution time

Revert back the user disk quota to a large value:

$ kubectl exec -i -t deployment/reana-server -- flask reana-admin quota-set -e [email protected] -r disk -l 1000000000 --admin-access-token $REANA_ACCESS_TOKEN
$ reana-client quota-show  --resource disk -h
7.51 MiB out of 953.67 MiB used (1%)

Let's submit 20 workflows into the queue:

cd reana-demo-root6-roofit
reana-benchmark submit -w test -n 1-20

Let's start them:

reana-benchmark start -w test -n 1-20

Whilst some of the workflows are running, some pending, and some in the queue still, let's put back the disk quota to a ridiculously low value:

kubectl exec -i -t deployment/reana-server -- flask reana-admin quota-set -e [email protected] -r disk -l 1 --admin-access-token $REANA_ACCESS_TOKEN

The expectation is that the later workflows that are still in the queue should not be allowed for the execution, because the user is out of disk quota. (The "running" and "pending" should be allowed, since they were already accepted for execution. A small overhead in the disk quota usage that this causes is tolerated.)

However, even the "queued" workflows are still nicely taken for execution, and all the workflows finish fine:

$ rcg list --filter name=test-
NAME      RUN_NUMBER   CREATED               STARTED               ENDED                 STATUS
test-20   1            2025-04-03T09:14:47   2025-04-03T09:19:02   2025-04-03T09:19:17   finished
test-19   1            2025-04-03T09:14:47   2025-04-03T09:18:58   2025-04-03T09:19:13   finished
test-18   1            2025-04-03T09:14:47   2025-04-03T09:18:54   2025-04-03T09:19:09   finished
test-17   1            2025-04-03T09:14:47   2025-04-03T09:18:50   2025-04-03T09:19:05   finished
test-16   1            2025-04-03T09:14:47   2025-04-03T09:18:46   2025-04-03T09:19:01   finished
test-15   1            2025-04-03T09:14:46   2025-04-03T09:18:42   2025-04-03T09:18:57   finished
test-14   1            2025-04-03T09:14:46   2025-04-03T09:18:38   2025-04-03T09:18:53   finished
test-13   1            2025-04-03T09:14:46   2025-04-03T09:18:34   2025-04-03T09:18:49   finished
test-12   1            2025-04-03T09:14:46   2025-04-03T09:18:30   2025-04-03T09:18:45   finished
test-11   1            2025-04-03T09:14:46   2025-04-03T09:18:26   2025-04-03T09:18:41   finished
test-10   1            2025-04-03T09:14:45   2025-04-03T09:18:21   2025-04-03T09:18:37   finished
test-9    1            2025-04-03T09:14:45   2025-04-03T09:18:17   2025-04-03T09:18:33   finished
test-8    1            2025-04-03T09:14:45   2025-04-03T09:18:13   2025-04-03T09:18:29   finished
test-7    1            2025-04-03T09:14:45   2025-04-03T09:18:09   2025-04-03T09:18:25   finished
test-6    1            2025-04-03T09:14:45   2025-04-03T09:18:05   2025-04-03T09:18:21   finished
test-5    1            2025-04-03T09:14:44   2025-04-03T09:18:01   2025-04-03T09:18:16   finished
test-4    1            2025-04-03T09:14:44   2025-04-03T09:17:57   2025-04-03T09:18:12   finished
test-3    1            2025-04-03T09:14:44   2025-04-03T09:17:53   2025-04-03T09:18:08   finished
test-2    1            2025-04-03T09:14:44   2025-04-03T09:17:49   2025-04-03T09:18:04   finished
test-1    1            2025-04-03T09:14:44   2025-04-03T09:17:45   2025-04-03T09:18:00   finished

This may lead to a serious over-quota situation when a user submits many thousands of workflows.

Expected behaviour

The CPU and Disk quota checking should be done not only at the workflow submission time, but also when they are taken from the queue and processed by the scheduler for execution. (Perhaps even for rescheduling.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions