Hadoop Code - Deep Dive Series - Part 3 - File System XAttrs

Hadoop Common hosts enums for Extended Attributers. Extended attributes is an amazing feature of HDFS file system. So I thought of making a deep dive into XAttr.

Let's start with what official documentation says:

Extended attributes (abbreviated as xattrs) are a filesystem feature that allow user applications to associate additional metadata with a file or directory. Unlike system-level inode metadata such as file permissions or modification time, extended attributes are not interpreted by the system and are instead used by applications to store additional information about an inode. Extended attributes could be used, for instance, to specify the character encoding of a plain-text document.

Wow!! For me, this opened up a lot of ideas and questions in my mind. Few of them are:

  • Why not cache file schema here instead of calling a central metastore?
  • Can we store some precomputed results to optimize the results?

HDFS-2006 describes the details of discussion, requirement, design and test plan. One of very important comments are from Chris Nauroth and Yi Liu. Also the latest design document helps a lot to understand the nitty gritty of the concept.

The data structure of XAttr in memory is:

private final NameSpace ns;

private final String name;

private final byte[] value;

So, does this increase the NameNode memory footprint?

No. Reason behind is the design itself. It has been implemented as INode feature. Here is what the design document states:

One goal is to minimized increase in the NameNode memory footprint. Deployments that do not wish to use XAttrs must not suffer a large increase in NN memory consumption as a side effect of introducing this feature.

The design develops XAttrs as a new inode feature. The new feature contains a list of XAttrs, and it is attached to an inode only if that inode has a XAttr. When load fsimage, whether inode has a XAttr is through a bit of int which is shared in inode. In this way, this feature does not increase NN memory on deployments that choose not to enable or have XAttrs.

So, coming back to my two ideas after reviewing the design docs and HDFS-2006, here are my thoughts:

  • Schema related metadata can be attached to Directory instead of files (as distributed programs tend to generate multiple part files). May be projects like Hive can provide option to set schema on write time. Thus benefiting MR, Tez or Spark Client readers to rely on HDFS than any other external metastore.
  • Large data like pre-aggregated data is not the right place for xattr. It is good to store reference (may be HDFS or any distributed FS) to the large data in xatter value and then direct clients to read from the file directly. Thus not overloading precious NameNode memory. Few limitations are also put in design to avoid this kind large data in xattr problem.
  • Keep track of how schema evolved (If archival and schema is stored with files)

I understand that there are advanced file formats such as Avro, ORC and Parquet which solved similar issue, but this XAttr itself is self contained for any files in HDFS. Not limiting to a particular format of file - Which is a great feature of keeping simple metadata which can reduce operational overhead, reduce latency of query and improve app maintenance when apps utilize XAttrs.

What kind of tools and techniques do you think can be developed using XAttr? Feel free to start a discussion in the comment section below.

If you are interested in further reading: