WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Rapidly uninstalling-reinstalling triggers can leave Astarte in a corrupt trigger state #1535

@davidebriani

Description

@davidebriani

We’ve observed that when Edgehog, an Astarte's API client, performs fast uninstall-and-install cycles for Astarte triggers, Astarte can end up in a corrupt state: triggers appear installed via APIs but do not function and do not deliver messages. This results in trigger-driven workflows, such as Edgehog's OTA campaigns, silently failing most of the time and only occasionally succeeding.

Investigation shows that Astarte’s trigger create and delete paths seem to be synchronous, so in principle they should be safe, yet quick successive operations still appear to lead to an inconsistent internal state.
One could also suspect that some Astarte's internal caches are not properly invalidated and refreshed with the updated trigger resources: e.g. the state in DUP processes.

The issue was found while using Edgehog v0.9.3 and Astarte v1.2-snapshot

Expected Behavior

  • After trigger deletion and recreation, the system remains consistent and all installed triggers are usable. Event delivery is not impacted.

Actual Behavior

  • Triggers appear as installed through APIs, but events are not delivered to the configured endpoint, Edgehog in this case.
  • Upon deletion & re-creation of triggers, Trigger Engine emits warnings such as:
    |WARN| Trigger not found: "5a7d06ee-1d3d-420b-a1e0-72b053fbe612" function=retrieve_trigger_configuration/2
    |WARN| Error while processing event: {:error, :trigger_not_found} function=handle_simple_event/6
  • State remains inconsistent until manual remediation steps are taken to clean the resources present in the database.

Impact

  • Astarte clients that use triggers risk corrupting Astarte's state and breaking message delivery when changing trigger definitions (or trigger delivery policy definitions, possibly).

Open Questions

  • Are there known race conditions in Trigger Engine or Realm Management when triggers are deleted and recreated in rapid succession?
  • Could internal caches or propagation delays cause a “phantom trigger” state where API reports installed but retrieval resolves to not found?
  • Would introducing stronger transactional guarantees, sequencing, or debounce on delete-create in the trigger path mitigate this?
  • Are there recommended guardrails or best practices for clients performing trigger reconciliation to avoid corrupt states?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions