WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@Greenscreen23
Copy link
Contributor

This branch implements more csv parsing options. The goal is that the following sql string is correctly parsed:

COPY "{}" FROM '{}' WITH (FORMAT CSV, DELIMITER '|', NULL '', QUOTE '"');

where delimiter is the delimiter of the csv file, null is the value of null values in the csv and quote is the string used to quote values.

… following sql string is correctly parsed:

```
COPY "{}" FROM '{}' WITH (FORMAT CSV, DELIMITER '|', NULL '', QUOTE '"');
```

where delimiter is the delimiter of the csv file, null is the value of null values in the csv and quote is the string used to quote values.
@Greenscreen23
Copy link
Contributor Author

Should bison check that the values of delimiter, null, and quote are only set if the format is CSV? This check would have to be done when the options are set (if they are set after the format) and when the format is set (if the values are set before the format). Additionally, this check may have to be done in the import/export statement (if no format is set at all), but we could leave that out (if we don't want to infer the data type from the file string).

@Greenscreen23 Greenscreen23 changed the title Better CSV import CSV import options for DELIMITER, NULL, and QUOTE Jul 28, 2025
Copy link
Member

@dey4ss dey4ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. I guess it makes sense to prohibit CSV options for binary files accordung to Postgres (https://www.postgresql.org/docs/current/sql-copy.html) at some point (either in the parser or on the DBMS side), whatever is easier.

@Greenscreen23
Copy link
Contributor Author

Some initial comments. I guess it makes sense to prohibit CSV options for binary files accordung to Postgres (https://www.postgresql.org/docs/current/sql-copy.html) at some point (either in the parser or on the DBMS side), whatever is easier.

Sure, I've tried to implement something like that. Does this fit your needs?

@Greenscreen23 Greenscreen23 requested a review from dey4ss August 2, 2025 16:33
Copy link
Member

@dey4ss dey4ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments, mostly regarding readability

!SELECT * FROM t WHERE a = DATE '2000-01-01' + x DAYS;
!SELECT * FROM t WHERE a = DATE '2000-01-01' + INTERVAL 'x' DAY;
!SELECT * FROM t WHERE a = DATE '2000-01-01' + INTERVAL '3.3 DAYS';
!COPY students FROM 'file_path' WITH (FORMAT TBL, DELIMITER '|', NULL '', QUOTE '"'); # Cannot have CSV options with non-CSV format
Copy link
Member

@dey4ss dey4ss Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add further invalid cases (e.g., options passed multiple times, order of format and CSV options, ...)?

Copy link
Member

@dey4ss dey4ss Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really necessary to ensure we clean up everything and do not leak when we call YYERROR. Please have a look at every path where you do that and add a case fot that, either in a full cpp test or here, so it will be triggered when we do leak checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think I got every path (that is not identical to another path)

@Greenscreen23 Greenscreen23 requested a review from dey4ss August 5, 2025 10:02
Copy link
Member

@dey4ss dey4ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments, still. IMO, the macro clutters the code too much

Copy link
Member

@dey4ss dey4ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments on style. Good to go apart from that.

}
delete ($$);
} <table_vec> <table_element_vec> <update_vec> <expr_vec> <order_vec> <stmt_vec>
%destructor { free($$->second); delete ($$); } <csv_option_t>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%destructor { free($$->second); delete ($$); } <csv_option_t>
%destructor {
free($$->second);
delete ($$);
} <csv_option_t>

Comment on lines 163 to 176
switch (option->first) {
case CsvOptionType::Delimiter:
if (delimiter != nullptr) return false;
delimiter = option->second;
break;
case CsvOptionType::Null:
if (null != nullptr) return false;
null = option->second;
break;
case CsvOptionType::Quote:
if (quote != nullptr) return false;
quote = option->second;
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
switch (option->first) {
case CsvOptionType::Delimiter:
if (delimiter != nullptr) return false;
delimiter = option->second;
break;
case CsvOptionType::Null:
if (null != nullptr) return false;
null = option->second;
break;
case CsvOptionType::Quote:
if (quote != nullptr) return false;
quote = option->second;
break;
}
switch (option->first) {
case CsvOptionType::Delimiter:
if (delimiter != nullptr) {
return false;
}
delimiter = option->second;
break;
case CsvOptionType::Null:
if (null != nullptr) {
return false;
}
null = option->second;
break;
case CsvOptionType::Quote:
if (quote != nullptr) {
return false;
}
quote = option->second;
break;
}

Here (and elsewhere): To compare != nullptr or not, that is the question. Hyrise's guidelines are quite clear about that:

Prefer if (object) over if (object != nullptr) or if (object.has_value()).

On the other hand, this is parser land. What dou you think, @Bouncner?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever happens in the parser land, I'd stick to it (I actually like very explicit checks for pointers and optionals, but I lost the fight back in the days).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really mixed here (even within this PR, that's why it caught my attention). I actually like the more implicit form, but I don't want to start a long discussion here. Braces after if are a must, though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Then use your recommended snipped? Looks good to me.

Copy link
Collaborator

@Bouncner Bouncner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zündable. But maybe still wait for @dey4ss to confirm. :)

Copy link
Member

@dey4ss dey4ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧨

@Greenscreen23 Greenscreen23 merged commit 3456f97 into hyrise:main Sep 17, 2025
4 checks passed
@Greenscreen23 Greenscreen23 deleted the lukas/better-csv-import branch September 17, 2025 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants