Avro Schema – Projection

If the input avro data file has many fields and we are interested in only few fields, then we can define a new schema with the fewer fields. this called as projection.

Original schema

{
"name":"Employee",
"type":"record",
"doc":"employee records",
"fields":[{
"name":"empId",
"type":"string"
},{
"name":"empName",
"type":"string"
}]
}

New Schema

{
"name":"Employee",
"type":"record",
"doc":"employee records",
"fields":[{
"name":"empName",
"type":"string"
}]
}

We removed the employee id from the new schema, so it will not present in the resulting data.

Code

public void readAvroFile(String schemaUri,String srcUri) throws Exception{
 Configuration conf = new Configuration();
 FileSystem fs=FileSystem.get(conf);
 Schema.Parser parser = new Schema.Parser();
 Schema schema=parser.parse(fs.open(new Path(schemaUri)));
 InputStream is=fs.open(new Path(srcUri));
 DatumReader<GenericRecord> reader =
 new GenericDatumReader<GenericRecord>(null,schema);
 //reading schema is diff from writing schema
 DataFileStream<GenericRecord> dataFileStream =
 new DataFileStream<GenericRecord>(is, reader);
 GenericRecord record=null;
 while(dataFileStream.hasNext()){
 record=dataFileStream.next(record);
 System.out.println(record);
 }
 dataFileStream.close();
 }
 

You can see in the result only one field is present

Avro Schema Projection
Avro Schema Projection

full project is available here.

Add a Comment

Your email address will not be published. Required fields are marked *