Best Practices
1. Whenever we infer the Schema for a large file define the Schema explicitly. Will get the following benefits:
- Relieve Spark from the onus of inferring the schema
- Prevent spark from creating a separate job just to read a large portion of file to ascertain the schema, which for a large file can be expensive and time consuming.
- Early detection of errors for schema mismatches.
Comments
Post a Comment