I am attempting to read data from a .CSV file into a Polars dataframe for analysis. I followed this prior Stack Overflow answer to get help on the layout of the CsvReader's chained methods.
fn read_csv_to_dataframe(path: &str) -> PolarsResult<DataFrame> {
CsvReader::from_path(path)?.has_header(true).finish()
}
When I run the code, I'm met with the following error from the compiler:
```Could not parse "TA1305000009" as dtype i64 at column 'end_station_id' (column number 8).
The current offset in the file is 108268613 bytes.
You might want to try:
- increasing
infer_schema_length(e.g.infer_schema_length=10000), - specifying correct dtype with the
dtypesargument - setting
ignore_errorstoTrue, - adding
"TA1305000009"to thenull_valueslist.
What I'd like to do:
1) Parse the CSV file
2) Input an override for the column end_station_id's i64 type to cast it to a string type.
3) Return the resulting dataframe.
I can get the dataframe to render when I use `.with_ignore_errors(ignore: true)`, but I specifically need the `end_station_id` and `start_station_id` columns.
From reading the docs for CsvReader, I've run into a few possibilities:
1- `with_dtypes()` - Overwrite the schema with the dtypes in this given Schema. The given schema may be a subset of the total schema.
This looks to be the optimal solution since I don't want to overwrite the whole schema, I'd just need to override a specific field.
2- `with_schema()` - Set the CSV file’s schema. This only accepts datatypes that are implemented in the csv parser and expects a complete Schema. It is recommended to use with_dtypes instead.
Ideally I would avoid this second implementation because I'd like to avoid having to write out the whole schema from scratch.
I don't know how to implement either one of these solutions since it appears to ask for a Schema definition and I can't quite follow how to provide one.
Help for this newbie Rustacean would be appreciated!