WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Implement sample by size on ray runner #5690

@colin-ho

Description

@colin-ho

Is your feature request related to a problem?

Currently sampling by size (i.e. num rows) only works on the native runner.

Describe the solution you'd like

I would like to do sampling by size on the ray runner.

Describe alternatives you've considered

No response

Additional Context

Currently, sampling by size works locally by assigning a random key to each row, and taking the minimum keys. We can extend this to distributed by gathering the local samples and then doing another round of min.

Would you like to implement a fix?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions