-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Related Template(s)
Convert file formats between Avro, Parquet & CSV
What feature(s) are you requesting?
Summary:
I would like to request the addition of a new feature to the existing template for converting file formats between Avro, Parquet, and CSV. Specifically, support for converting XML files to Parquet format.
Use Case:
Many data processing workflows require ingesting and transforming XML data before loading it into analytics platforms such as BigQuery. Enabling XML to Parquet conversion within Dataflow Templates would streamline these workflows and expand the utility of the template.
Proposed Solution:
Leverage Dataflow's XMLIO capabilities to read XML files and convert them to Parquet format. The implementation should allow users to specify parameters such as the root element and record element to accommodate various XML schemas.
Benefits:
- Simplifies ETL pipelines for XML data sources.
- Facilitates direct loading of XML data into BigQuery via Parquet.
Additional Context:
This enhancement would be valuable for users dealing with XML data and looking for a scalable, managed solution to convert and load data into BigQuery.