Hadoop Supports many input/output file formats. In this blog we will see how to read a text file and we will save the results in Sequence file format.
The file is CSV with the following fields
Our requirement is to group the songs by an artist. So the resulted file will have a single record for each artist, with the all the songs sang by him.
First we need to ignore the record with the offset 0 (key to the map) as that will contain the name of the field.
then split the record by , and write field 4 as key (artist) , field 5 as value (song)
context.write(new Text(tokens), new Text(tokens));
On reducer side we just need we just need convert the all songs into a writable
ArrayList<Text> vs=new ArrayList<Text>();
context.write(key, new Text(vs.toString()));
Now the reducer output will be saved in Sequence file format. In the reducer you can use better approach to save the songs collection. Instead of adding it to a ArrayList and again converting it to a Text we can use ArrayWritable
Full project is available here