Skip to content

Instantly share code, notes, and snippets.

@klesouza
Last active February 8, 2021 10:18
Show Gist options
  • Save klesouza/4b2f12f7b038d3aac5d6d8439984e7b0 to your computer and use it in GitHub Desktop.
Save klesouza/4b2f12f7b038d3aac5d6d8439984e7b0 to your computer and use it in GitHub Desktop.
Reading and writing Parquet files using ParquetAvro library
import org.apache.parquet.avro.AvroParquetReader;
import org.apache.parquet.avro.AvroParquetWriter;
ParquetReader<GenericRecord> r = AvroParquetReader.<GenericRecord>builder(new Path("file.parquet"))
.withDataModel(GenericData.get())
.build();
ParquetWriter<GenericRecord> w = AvroParquetWriter.<GenericRecord>builder(new Path("file2.parquet"))
.withDataModel(GenericData.get())
.withCompressionCodec(CompressionCodecName.SNAPPY)
.withSchema(SCHEMA)
.build();
GenericRecord a;
while((a = r.read()) != null){
w.write(a);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment