WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Functions to get all darwin cut notes based on image dimensions - in python and spark for efficient parallel processing

Notifications You must be signed in to change notification settings

HackTheStacks/darwin-image-preprocessing

Repository files navigation

darwin-image-preprocessing

Functions to get all darwin cut notes based on image dimensions and throw away full-page notes (non cut notes). Works by comparing image dimensions to mean image dimensions within folder. Written in PySpark for efficient parallel processing due to dataset size of ~350GB and ~60k images.

About

Functions to get all darwin cut notes based on image dimensions - in python and spark for efficient parallel processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages