Reading HDFS files from JAVA program

Any java program can read files from HDFS, given the dependencies. In this blog we will read a file from HDFS and print it’s contents on console.

HDFS File Read
HDFS File Read, source: Hadoop: The definitive guide
The program flow:
  1. Create a Configuration object
  2. Create FileSystem object by passing the config object (In case of HDFS, the FileSystem object is DFS)
  3. Get InputStream from the file system by passing the file path (this is actually FSDataInputStream)

This is very basic way to read the bytes form the input stream, you can create the buffered stream and use the readline method.


package net.icircuit.hadoop.HDFSRead;
import java.io.IOException;
import java.io.InputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class Main {

public static void main(String[] args) throws IOException {
  // TODO Auto-generated method stub
  Configuration conf=new Configuration();
  FileSystem fs=FileSystem.get(conf);
  InputStream is=fs.open(new Path(args[0]));
  while(is.available()>0){
   System.out.print((char)is.read());
  }
  is.close();
 }
}

Reading_File_from_hdfs

Full project is available here.

Tags:,

Add a Comment

Your email address will not be published. Required fields are marked *