WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

alifnuryana/dgw-scrapper

πŸš€ DGW Scrapper

A powerful and automated web scraper for extracting LPJ data from the DGW Spartan platform. This tool streamlines the process of downloading and processing activity reports into organized Excel spreadsheets.

Python Version License Code style: black

✨ Features

  • πŸ€– Automated Login: Seamlessly authenticate with DGW Spartan platform
  • πŸ“… Date Range Filtering: Extract data for specific time periods
  • πŸ“Š Excel Export: Generate clean, organized Excel reports
  • 🎨 Rich CLI Output: Beautiful terminal interface with progress tracking
  • ⚑ Fast Processing: Efficient Playwright-based browser automation
  • πŸ”„ Batch Processing: Handle multiple LPJ documents in one run
  • πŸ“ Organized Output: Automatically structured file naming and storage

πŸ“‹ Table of Contents

πŸ”§ Prerequisites

Before you begin, ensure you have the following installed:

  • Python 3.8 or higher
  • pip (Python package installer)
  • Valid DGW Spartan account credentials

πŸ“¦ Installation

  1. Clone the repository
git clone https://github.com/alifnuryana/dgw-scrapper.git
cd dgw-scrapper
  1. Install dependencies
pip install -r requirements.txt
  1. Install Playwright browsers
playwright install chromium

πŸš€ Usage

Basic Command

python main.py --email YOUR_EMAIL --password YOUR_PASSWORD --from_date DD/MM/YYYY --to_date DD/MM/YYYY

Example

python main.py --email [email protected] --password mypassword123 --from_date 01/01/2024 --to_date 31/01/2024

Parameters

Parameter Required Description Format
--email βœ… Yes Your DGW Spartan email string
--password βœ… Yes Your DGW Spartan password string
--from_date βœ… Yes Start date for data extraction DD/MM/YYYY
--to_date βœ… Yes End date for data extraction DD/MM/YYYY

βš™οΈ Configuration

The scraper is configured to:

  • Navigate to the "Sudah Diproses" (Processed) tab
  • Filter by document type: LPJ
  • Extract the following data:
    • Activity Name
    • PO Name
    • Total Amount
    • Activity Count

Output Directory

All generated files are saved in the output/ directory, which is automatically created if it doesn't exist. The directory is cleaned before each run to ensure fresh data.

πŸ“„ Output Format

File Naming Convention

Files are named using the following pattern:

{YYYY - Month} - {Submitted By} - {Activity Type 1} - {Activity Type 2} - {Proposal Name}.xlsx

Example:

2024 - January - John Doe - Workshop - Training - Employee Development.xlsx

Excel Structure

The generated Excel files contain the following columns:

Column Description
Activity Name Name of the activity
PO Name Purchase Order name
Total Total amount (in Rupiah)
Count Number of activities

Data is automatically:

  • βœ… Cleaned and formatted
  • βœ… Grouped by Activity Name and PO Name
  • βœ… Aggregated with sum and count calculations
  • βœ… Converted to proper numeric formats

πŸ› οΈ Troubleshooting

Common Issues

Browser fails to launch

Ensure Playwright browsers are installed:

playwright install chromium
Login fails
  • Verify your email and password are correct
  • Check if your account has access to the Spartan platform
  • Ensure you're not using special characters that need escaping
TimeoutError during scraping

This can occur if:

  • Network connection is slow
  • The page takes longer to load
  • An item has no data table

The scraper will skip problematic items and continue processing others.

Empty output folder
  • Check if the date range contains any LPJ documents
  • Verify the filter settings match available documents
  • Review the console output for any error messages

🀝 Contributing

Contributions are welcome! Here's how you can help:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Commit your changes (git commit -m 'Add some amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add comments for complex logic
  • Test your changes thoroughly
  • Update documentation as needed

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Built with Playwright for reliable browser automation
  • Uses pandas for efficient data processing
  • Enhanced with Rich for beautiful terminal output

πŸ“§ Contact

Alif Nuryana - @alifnuryana

Project Link: https://github.com/alifnuryana/dgw-scrapper


Made with ❀️ by Alif Nuryana

About

Automated web scraper for DGW Spartan LPJ data extraction

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors 2

  •  
  •