HDF5 data format

From SAXSutilities wiki
In the beginning there was the EDF format...
INFO: Problems caused by  the  "ESRF data format"  are unfortunately far too
      common.  The orginal software available  from  the  ESRF  is  far  too
      bugged to  be usable.  It  is  not available for many of the operating
      systems on which FIT2D is required to run,  and  users use.  Therefore
      I  (and others) have written their own input routines.  However,  this
      is  only  a partial solution.  This is because the "format" is totally
      inadequately  defined,   and   even   where  defined  there  are  huge
      differences between the specification and the files actually produced. 
      It  is  also  an  unnecessarily complicated format  with  certain data
      compression schemes almost totally undefined.  Therefore this code has
      only been written to input a small subset of the possible file formats
      which could be produced.  In particular all header information must be
      found before the image data, and data compression is not supported.

Message when importing EDF files in Fit2D

...now we have a NeXus standard
NeXus is an effort by an international group of scientists to define a common 
data exchange and archival format for neutron, X-ray and muon experiments. 
NeXus is built on top of the scientific data format HDF5 and adds domain-spe-
cific rules for organizing data within HDF5 files, in addition to a dictiona-
ry of well defined domain-specific field names. The NeXus data format has two 
purposes.  First,  it defines a format  that can serve as a container for all 
relevant data associated with a beamline.  This is a very important use case. 
Second,  it defines standards in the form of application  definitions for the 
exchange  of data  between applications.  NeXus provides  structures  for raw 
experimental data as well as for processed data.

J. Appl. Cryst. (2015). 48, 301-305

This sounds promising, but reality is as usual much, much more chaotic. In fact, it is nowadays rather straightforward to read or write data/error arrays together with important metadata in EDF format. Routines are available for Python (fabio library) or Matlab (see EDF data format).

NeXus on the other hand is a standard which leaves a lot of room for interpretation. As a result, there are unfortunately no routines readily available for Python/Matlab to read data/error arrays and metadata from HDF5 files as comfortably as it is the case for EDF files. In the following I list the search path for data and metadata for a selection of known interpretations of the NeXus standard. The main problem in many cases is to programmatically determine the present data type, as there is no common identifier foreseen. Once known, it is rather easy to extract data as high level routines are available in programming languages. The correct interpretation of metadata needs to be taken care in each case individually.

List of important HDF 'formats' using NeXus standard - RAW data

Lima written raw data at ESRF from 2020

entry further HDF5 groups interpretation
entry_{number} * default attribute at root level which points to relevant entry number
* default attribute at entry level which points to relevant plot (NXdata)
instrument
{detector_name}
plot NX_class=NXdata
data data array
header group with static metadata - at same level as plot

Lima written raw data at ESRF before 2020

entry further HDF5 groups interpretation
entry_{number} * default attribute at root level which points to relevant entry number
* default attribute at entry level which points to relevant NXdata group
measurement
{detector_name}
data NX_class=NXdata
array data array
header group with static metadata

Dynamic metadata for both types of raw data (scalers file)

entry further HDF5 groups interpretation
entry_{number} * default attribute at root level which points to relevant entry number
id02
MCS multi channel scalers
TFG time frame generator
parameters group with copy of static metadata

Dectris written raw data

entry further HDF5 groups interpretation
entry
data NXclass: NXdata
data data array
no metadata available

List of important HDF 'formats' using NeXus standard - reduced data

Written by online datareduction at ESRF (pyFAI/DAHU) - SAXS

entry further HDF5 groups interpretation
entry_{number} different entries represent repetitions of data reduction
  • default attribute at root level which points to relevant entry number
  • default attribute at entry level which points to relevant data set (NXdata)
PyFAI
result_{type} NX_class=NXdata
name indicates type of data reduction
data data array
data_errors error array - RELATIVE
variance = relative_error2
t time - important for image series
q {azim} and {ave}: q vector (q cannot be calculated from metadata as in the case of edf files)
chi {azim}: azimuthal angle (cannot be calculated from metadata as in the case of edf files)
parameters copy of static and dynamic metadata from raw images (not all parameters describe the data arrays)

Written by online datareduction at ESRF (pyFAI/DAHU) - XPCS

entry further HDF5 groups interpretation
entry_{number} different entries represent repetitions of data reduction
  • default attribute at root level which points to relevant entry number
  • default attribute at entry level which points to relevant data set (NXdata)
1_XPCS
results NX_class=NXdata
g2 correlation function
t time - important for image series
q q vector

Written by SAXSutilites2 package

entry further HDF5 groups interpretation
entry_{number} different entries represent repetitions of data reduction
  • default attribute at root level which points to relevant entry number
  • default attribute at entry level which points to relevant data set (NXdata)
saxsutilities
data NX_class=NXdata
array data array
array_errors error array - VARIANCE
like in edf files (certain file types only)
t time - important for image series
q q vector (q can also be calculated from metadata as in the case of edf files) (certain file types only)
chi azimuthal angle (can also be calculated from metadata as in the case of edf files) (certain file types only)
header_array metadata describing data array
header_array_errors metadata describing variance array (certain file types only)

Written by saxs programs package

entry further HDF5 groups interpretation
SXentry_{number1}
SXseries_{number2} {number2} indicates different images of image series (file type A)
SXmemory_{number3}
SXdata image array - images of image series can also be written in 3 dimensional array (file type B)
SXerror variance array
SXheader metadata describing each data array