-
Notifications
You must be signed in to change notification settings - Fork 255
CSV import options for DELIMITER, NULL, and QUOTE #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV import options for DELIMITER, NULL, and QUOTE #256
Conversation
… following sql string is correctly parsed:
```
COPY "{}" FROM '{}' WITH (FORMAT CSV, DELIMITER '|', NULL '', QUOTE '"');
```
where delimiter is the delimiter of the csv file, null is the value of null values in the csv and quote is the string used to quote values.
|
Should bison check that the values of delimiter, null, and quote are only set if the format is CSV? This check would have to be done when the options are set (if they are set after the format) and when the format is set (if the values are set before the format). Additionally, this check may have to be done in the import/export statement (if no format is set at all), but we could leave that out (if we don't want to infer the data type from the file string). |
dey4ss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments. I guess it makes sense to prohibit CSV options for binary files accordung to Postgres (https://www.postgresql.org/docs/current/sql-copy.html) at some point (either in the parser or on the DBMS side), whatever is easier.
Sure, I've tried to implement something like that. Does this fit your needs? |
dey4ss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments, mostly regarding readability
test/queries/queries-bad.sql
Outdated
| !SELECT * FROM t WHERE a = DATE '2000-01-01' + x DAYS; | ||
| !SELECT * FROM t WHERE a = DATE '2000-01-01' + INTERVAL 'x' DAY; | ||
| !SELECT * FROM t WHERE a = DATE '2000-01-01' + INTERVAL '3.3 DAYS'; | ||
| !COPY students FROM 'file_path' WITH (FORMAT TBL, DELIMITER '|', NULL '', QUOTE '"'); # Cannot have CSV options with non-CSV format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add further invalid cases (e.g., options passed multiple times, order of format and CSV options, ...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really necessary to ensure we clean up everything and do not leak when we call YYERROR. Please have a look at every path where you do that and add a case fot that, either in a full cpp test or here, so it will be triggered when we do leak checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think I got every path (that is not identical to another path)
dey4ss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments, still. IMO, the macro clutters the code too much
dey4ss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments on style. Good to go apart from that.
src/parser/bison_parser.y
Outdated
| } | ||
| delete ($$); | ||
| } <table_vec> <table_element_vec> <update_vec> <expr_vec> <order_vec> <stmt_vec> | ||
| %destructor { free($$->second); delete ($$); } <csv_option_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| %destructor { free($$->second); delete ($$); } <csv_option_t> | |
| %destructor { | |
| free($$->second); | |
| delete ($$); | |
| } <csv_option_t> |
| switch (option->first) { | ||
| case CsvOptionType::Delimiter: | ||
| if (delimiter != nullptr) return false; | ||
| delimiter = option->second; | ||
| break; | ||
| case CsvOptionType::Null: | ||
| if (null != nullptr) return false; | ||
| null = option->second; | ||
| break; | ||
| case CsvOptionType::Quote: | ||
| if (quote != nullptr) return false; | ||
| quote = option->second; | ||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| switch (option->first) { | |
| case CsvOptionType::Delimiter: | |
| if (delimiter != nullptr) return false; | |
| delimiter = option->second; | |
| break; | |
| case CsvOptionType::Null: | |
| if (null != nullptr) return false; | |
| null = option->second; | |
| break; | |
| case CsvOptionType::Quote: | |
| if (quote != nullptr) return false; | |
| quote = option->second; | |
| break; | |
| } | |
| switch (option->first) { | |
| case CsvOptionType::Delimiter: | |
| if (delimiter != nullptr) { | |
| return false; | |
| } | |
| delimiter = option->second; | |
| break; | |
| case CsvOptionType::Null: | |
| if (null != nullptr) { | |
| return false; | |
| } | |
| null = option->second; | |
| break; | |
| case CsvOptionType::Quote: | |
| if (quote != nullptr) { | |
| return false; | |
| } | |
| quote = option->second; | |
| break; | |
| } |
Here (and elsewhere): To compare != nullptr or not, that is the question. Hyrise's guidelines are quite clear about that:
Prefer
if (object)overif (object != nullptr)orif (object.has_value()).
On the other hand, this is parser land. What dou you think, @Bouncner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever happens in the parser land, I'd stick to it (I actually like very explicit checks for pointers and optionals, but I lost the fight back in the days).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really mixed here (even within this PR, that's why it caught my attention). I actually like the more implicit form, but I don't want to start a long discussion here. Braces after if are a must, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Then use your recommended snipped? Looks good to me.
Bouncner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zündable. But maybe still wait for @dey4ss to confirm. :)
dey4ss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧨
This branch implements more csv parsing options. The goal is that the following sql string is correctly parsed:
where delimiter is the delimiter of the csv file, null is the value of null values in the csv and quote is the string used to quote values.