Avro Schema- Adding new fields

Your reading schema doesn’t has to be same as that of the writing schema. You can add new fields or remove the existing fields(projection).

If a new field is added to the reading schema, then you have to specify a default value , which will be used if the field is not present in the data.

Original schema
{
"name":"Employee",
"type":"record",
"doc":"employee records",
"fields":[{
"name":"empId",
"type":"string"
},{
"name":"empName",
"type":"string"
}]
}

New schema with added field

{
"name":"Employee",
"type":"record",
"doc":"employee records",
"fields":[{
"name":"empId",
"type":"string"
},{
"name":"empName",
"type":"string"
},{
"name":"position",
"type":"string",
"default":"unknown"
}]
}

When reading schema is different from writing schema, you have to specify both in the DatumReader constructor. If the writing schema is already present with the data, you can pass null as is the case bellow.

public void readAvroFile(String schemaUri,String srcUri) throws Exception{
 Configuration conf = new Configuration();
 FileSystem fs=FileSystem.get(conf);
 Schema.Parser parser = new Schema.Parser();
 Schema schema=parser.parse(fs.open(new Path(schemaUri)));
 InputStream is=fs.open(new Path(srcUri));
 DatumReader<GenericRecord> reader =
 new GenericDatumReader<GenericRecord>(null,schema);
 //reading schema is diff from writing schema
 DataFileStream<GenericRecord> dataFileStream =
 new DataFileStream<GenericRecord>(is, reader);
 GenericRecord record=null;
 while(dataFileStream.hasNext()){
 record=dataFileStream.next(record);
 System.out.println(record);
 }
 dataFileStream.close();
 } 

Avro schema adding new fields
Avro schema adding new fields

full project is available on GitHub.

Add a Comment

Your email address will not be published. Required fields are marked *