WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

[Feature Request]: Support for XML to Parquet Conversion in Dataflow Templates #3013

@paragpratim

Description

@paragpratim

Related Template(s)

Convert file formats between Avro, Parquet & CSV

What feature(s) are you requesting?

Summary:
I would like to request the addition of a new feature to the existing template for converting file formats between Avro, Parquet, and CSV. Specifically, support for converting XML files to Parquet format.

Use Case:
Many data processing workflows require ingesting and transforming XML data before loading it into analytics platforms such as BigQuery. Enabling XML to Parquet conversion within Dataflow Templates would streamline these workflows and expand the utility of the template.

Proposed Solution:
Leverage Dataflow's XMLIO capabilities to read XML files and convert them to Parquet format. The implementation should allow users to specify parameters such as the root element and record element to accommodate various XML schemas.

Benefits:

  • Simplifies ETL pipelines for XML data sources.
  • Facilitates direct loading of XML data into BigQuery via Parquet.

Additional Context:
This enhancement would be valuable for users dealing with XML data and looking for a scalable, managed solution to convert and load data into BigQuery.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions