Avro Schema – Projection
|If the input avro data file has many fields and we are interested in only few fields, then we can define a new schema with the fewer fields. this called as projection.
Original schema
{ "name":"Employee", "type":"record", "doc":"employee records", "fields":[{ "name":"empId", "type":"string" },{ "name":"empName", "type":"string" }] }
New Schema
{ "name":"Employee", "type":"record", "doc":"employee records", "fields":[{ "name":"empName", "type":"string" }] }
We removed the employee id from the new schema, so it will not present in the resulting data.
Code
public void readAvroFile(String schemaUri,String srcUri) throws Exception{ Configuration conf = new Configuration(); FileSystem fs=FileSystem.get(conf); Schema.Parser parser = new Schema.Parser(); Schema schema=parser.parse(fs.open(new Path(schemaUri))); InputStream is=fs.open(new Path(srcUri)); DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(null,schema); //reading schema is diff from writing schema DataFileStream<GenericRecord> dataFileStream = new DataFileStream<GenericRecord>(is, reader); GenericRecord record=null; while(dataFileStream.hasNext()){ record=dataFileStream.next(record); System.out.println(record); } dataFileStream.close(); }
You can see in the result only one field is present
full project is available here.