Hdf5JavaLib

Hdf5JavaLib: Read Capabilities (Version 0.2.1)

Hdf5JavaLib is a pure Java library for reading HDF5 files, released as version 0.2.1. This guide helps users read datasets in the root group using examples from the org.hdf5javalib.examples.read package. The library reads HDF5 files generated by the C++ HDF5 library, supporting compound datasets, multi-dimensional data, and various datatypes. New in version 0.2.1: enhanced sequential and parallel streaming, array flattening, slicing, reducing along axes, filtering with coordinate lists, and custom type converters for compound datasets.

Setup

Add the Dependency

Add Hdf5JavaLib to your project via Maven Central:

<dependency>
    <groupId>org.hdf5javalib</groupId>
    <artifactId>hdf5javalib</artifactId>
    <version>0.2.1</version>
</dependency>

Requirements

Alternative: Build from Source

  1. Clone the repository:
    git clone https://github.com/karlnicholas/Hdf5JavaLib.git
    
  2. Build with Maven:
    cd Hdf5JavaLib
    mvn install
    

HDF5 Files

Read Capabilities

Hdf5JavaLib supports reading datasets in the root group (/), including:

Usage

Use the HdfFileReader class (in org.hdf5javalib.hdfjava) to read HDF5 files. The TypedDataSource class provides typed data access (scalar, vector, matrix, flattened) and streaming. HdfDisplayUtils simplifies data display. The org.hdf5javalib.examples.read package demonstrates reading various datasets with advanced processing.

Example 1: Read a Compound Dataset

This example reads a compound dataset (/CompoundData) from an HDF5 file, mapping it to a custom Java record with fields like fixed-point, floating-point, time, string, bitfield, opaque, nested compound, reference, enum, array, and variable-length data. It demonstrates streaming and counting rows.

package org.hdf5javalib.examples.read;

import org.hdf5javalib.dataclass.*;
import org.hdf5javalib.datasource.TypedDataSource;
import org.hdf5javalib.hdfjava.HdfDataFile;
import org.hdf5javalib.hdfjava.HdfDataset;
import org.hdf5javalib.hdfjava.HdfFileReader;

import java.io.IOException;
import java.nio.channels.SeekableByteChannel;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.BitSet;
import java.util.concurrent.atomic.AtomicInteger;

public class CompoundRead {
    private static final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(CompoundRead.class);

    public static void main(String[] args) {
        new CompoundRead().run();
    }

    private void run() {
        try {
            Path filePath = getResourcePath("compound_example.h5");
            try (SeekableByteChannel channel = Files.newByteChannel(filePath, StandardOpenOption.READ)) {
                HdfFileReader reader = new HdfFileReader(channel).readFile();
                try (HdfDataset dataSet = reader.getDataset("/CompoundData").orElseThrow()) {
                    displayData(channel, dataSet, reader);
                }
                log.debug("File BTree: {} ", reader.getBTree());
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    private Path getResourcePath(String fileName) {
        String resourcePath = getClass().getClassLoader().getResource(fileName).getPath();
        if (System.getProperty("os.name").toLowerCase().contains("windows") && resourcePath.startsWith("/")) {
            resourcePath = resourcePath.substring(1);
        }
        return Paths.get(resourcePath);
    }

    public record Record(
            Integer fixed_point,              // int32_t fixed_point
            Float floating_point,         // float floating_point
            Long time,                   // uint64_t time (Class 2 Time)
            String string,               // char string[16]
            BitSet bit_field,            // uint8_t bit_field
            HdfOpaque opaque,            // uint8_t opaque[4]
            Compound compound,           // nested struct compound
            HdfReference reference,      // hobj_ref_t reference
            HdfEnum enumerated,          // int enumerated (LOW, MEDIUM, HIGH)
            HdfArray array,              // int array[3]
            HdfVariableLength variable_length // hvl_t variable_length
    ) {
        public record Compound(
                Integer nested_int,          // int16_t nested_int
                Double nested_double         // double nested_double
        ) {
        }

        public enum Level {
            LOW(0), MEDIUM(1), HIGH(2);
            private final int value;

            Level(int value) {
                this.value = value;
            }

            public int getValue() {
                return value;
            }
        }
    }

    public void displayData(SeekableByteChannel seekableByteChannel, HdfDataset dataSet, HdfDataFile hdfDataFile) throws IOException {
        System.out.println("Ten Rows:");
        new TypedDataSource<>(seekableByteChannel, hdfDataFile, dataSet, HdfCompound.class)
                .streamVector()
                .limit(10)
                .forEach(c -> System.out.println("Row: " + c.getMembers()));

        AtomicInteger atomicInteger = new AtomicInteger(0);
        new TypedDataSource<>(seekableByteChannel, hdfDataFile, dataSet, HdfCompound.class)
                .streamVector()
                .forEach(c -> {
                    c.getMembers().toString();
                    atomicInteger.incrementAndGet();
                });
        System.out.println("DONE: " + atomicInteger.get());
    }
}

Output (example):

Ten Rows:
Row: [fixed_point=0, floating_point=0.0, time=0, string=string0, bit_field=00000001, opaque=opaque0, compound=[nested_int=0, nested_double=0.0], reference=reference0, enumerated=LOW, array=[0, 0, 0], variable_length=[0, 1, 2, 3, 4]]
Row: [fixed_point=1, floating_point=1.0, time=1, string=string1, bit_field=00000010, opaque=opaque1, compound=[nested_int=1, nested_double=1.0], reference=reference1, enumerated=MEDIUM, array=[1, 1, 1], variable_length=[1, 2, 3, 4, 5]]
...
DONE: 10000

Note: Replace getResourcePath with your own file loading logic (e.g., Files.newByteChannel(Paths.get("path/to/compound_example.h5"), StandardOpenOption.READ)) for custom HDF5 files.

Example 2: Read Scalar Datasets

This example reads multiple scalar datasets from twenty_datasets.h5, displaying values as Long.

package org.hdf5javalib.examples.read;

import org.hdf5javalib.hdfjava.HdfDataset;
import org.hdf5javalib.hdfjava.HdfFileReader;
import org.hdf5javalib.utils.HdfDisplayUtils;

import java.nio.channels.SeekableByteChannel;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;

import static org.hdf5javalib.utils.HdfReadUtils.getResourcePath;

public class TwentyScalarRead {
    private static final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(TwentyScalarRead.class);

    public static void main(String[] args) throws Exception {
        new TwentyScalarRead().run();
    }

    private void run() throws Exception {
        Path filePath = getResourcePath("twenty_datasets.h5");
        try (SeekableByteChannel channel = Files.newByteChannel(filePath, StandardOpenOption.READ)) {
            HdfFileReader reader = new HdfFileReader(channel).readFile();
            for (HdfDataset dataSet : reader.getDatasets()) {
                try (HdfDataset ds = dataSet) {
                    HdfDisplayUtils.displayScalarData(channel, ds, Long.class, reader);
                }
            }
            log.debug("Superblock: {} ", reader.getSuperblock());
        }
    }
}

Output (example):

Dataset0: 123
Dataset1: 456
...
Dataset19: 789

Note: Use your own HDF5 file path if not using resources.

Example 3: Read String Vectors

This example reads vectors of ASCII and UTF-8 strings from ascii_dataset.h5 and utf8_dataset.h5.

package org.hdf5javalib.examples.read;

import org.hdf5javalib.hdfjava.HdfDataset;
import org.hdf5javalib.hdfjava.HdfFileReader;
import org.hdf5javalib.utils.HdfDisplayUtils;

import java.io.FileInputStream;
import java.nio.channels.FileChannel;
import java.util.Objects;

public class StringRead {
    public static void main(String[] args) throws Exception {
        new StringRead().run();
    }

    private void run() throws Exception {
        String filePath = Objects.requireNonNull(StringRead.class.getResource("/ascii_dataset.h5")).getFile();
        try (FileInputStream fis = new FileInputStream(filePath)) {
            FileChannel channel = fis.getChannel();
            HdfFileReader reader = new HdfFileReader(channel).readFile();
            try (HdfDataset dataSet = reader.getDataset("/strings").orElseThrow()) {
                HdfDisplayUtils.displayVectorData(channel, dataSet, String.class, reader);
            }
        }
        filePath = Objects.requireNonNull(StringRead.class.getResource("/utf8_dataset.h5")).getFile();
        try (FileInputStream fis = new FileInputStream(filePath)) {
            FileChannel channel = fis.getChannel();
            HdfFileReader reader = new HdfFileReader(channel).readFile();
            try (HdfDataset dataSet = reader.getDataset("/strings").orElseThrow()) {
                HdfDisplayUtils.displayVectorData(channel, dataSet, String.class, reader);
            }
        }
    }
}

Output (example):

["Hello", "World", "HDF5"]
["UTF-8 String1", "UTF-8 String2"]

Note: Replace resource loading with FileChannel.open(Paths.get("path/to/file.h5")) for custom files.

Example 4: Read Datasets with Different Dimensions

This example reads scalar, 1D, 2D, and array datasets from array_datasets.h5, displaying their data.

package org.hdf5javalib.examples.read;

import org.hdf5javalib.datasource.TypedDataSource;
import org.hdf5javalib.hdfjava.HdfDataset;
import org.hdf5javalib.hdfjava.HdfFileReader;

import java.nio.channels.SeekableByteChannel;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;

import static org.hdf5javalib.utils.HdfDisplayUtils.displayData;
import static org.hdf5javalib.utils.HdfReadUtils.getResourcePath;

public class DimensionsRead {
    private static final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(DimensionsRead.class);

    public static void main(String[] args) {
        new DimensionsRead().run();
    }

    private void run() {
        try {
            Path filePath = getResourcePath("array_datasets.h5");
            try (SeekableByteChannel channel = Files.newByteChannel(filePath, StandardOpenOption.READ)) {
                HdfFileReader reader = new HdfFileReader(channel).readFile();
                log.debug("File BTree: {} ", reader.getBTree());
                for (HdfDataset dataSet : reader.getDatasets()) {
                    displayData(channel, dataSet, reader);
                }
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

Output (example):

Scalar dataset: 3.14
Vector dataset: [1.0, 2.0, 3.0]
Matrix dataset:
1.0 2.0
3.0 4.0

Note: Adapt getResourcePath for your file system.

Example 5: Read Fixed-Point Data with Advanced Operations

This example reads fixed-point data from scalar, matrix, and 4D datasets, demonstrating streaming, flattening, slicing, and filtering.

package org.hdf5javalib.examples.read;

import org.hdf5javalib.dataclass.HdfFixedPoint;
import org.hdf5javalib.datasource.TypedDataSource;
import org.hdf5javalib.hdfjava.HdfDataFile;
import org.hdf5javalib.hdfjava.HdfDataset;
import org.hdf5javalib.hdfjava.HdfFileReader;
import org.hdf5javalib.utils.FlattenedArrayUtils;

import java.io.IOException;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.math.RoundingMode;
import java.nio.channels.SeekableByteChannel;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

import static org.hdf5javalib.utils.HdfReadUtils.getResourcePath;

public class FixedPointRead {
    private static final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(FixedPointRead.class);

    public static void main(String[] args) throws Exception {
        new FixedPointRead().run();
    }

    void run() throws Exception {
        Path filePath = getResourcePath("scalar.h5");
        try (SeekableByteChannel channel = Files.newByteChannel(filePath, StandardOpenOption.READ)) {
            HdfFileReader reader = new HdfFileReader(channel).readFile();
            log.debug("File BTree: {} ", reader.getBTree());
            tryScalarDataSpliterator(channel, reader, reader.getDatasets().get(0));
        }

        filePath = getResourcePath("weatherdata.h5");
        try (SeekableByteChannel channel = Files.newByteChannel(filePath, StandardOpenOption.READ)) {
            HdfFileReader reader = new HdfFileReader(channel).readFile();
            tryMatrixSpliterator(channel, reader, reader.getDataset("/weatherdata").orElseThrow());
        }

        filePath = getResourcePath("tictactoe_4d_state.h5");
        try (SeekableByteChannel channel = Files.newByteChannel(filePath, StandardOpenOption.READ)) {
            HdfFileReader reader = new HdfFileReader(channel).readFile();
            display4DData(channel, reader, reader.getDataset("/game").orElseThrow());
        }
    }

    void tryScalarDataSpliterator(SeekableByteChannel channel, HdfDataFile hdfDataFile, HdfDataset dataSet) throws IOException {
        TypedDataSource<BigInteger> dataSource = new TypedDataSource<>(channel, hdfDataFile, dataSet, BigInteger.class);
        BigInteger allData = dataSource.readScalar();
        System.out.println("Scalar dataset name = " + dataSet.getObjectName());
        System.out.println("Scalar readAll stats = " + Stream.of(allData)
                .collect(Collectors.summarizingInt(BigInteger::intValue)));
        System.out.println("Scalar streaming list = " + dataSource.streamScalar().toList());
        System.out.println("Scalar parallelStreaming list = " + dataSource.parallelStreamScalar().toList());
    }

    void tryMatrixSpliterator(SeekableByteChannel fileChannel, HdfDataFile hdfDataFile, HdfDataset dataSet) throws IOException {
        TypedDataSource<BigDecimal> dataSource = new TypedDataSource<>(fileChannel, hdfDataFile, dataSet, BigDecimal.class);
        BigDecimal[][] allData = dataSource.readMatrix();
        System.out.println("Matrix readAll() = ");
        for (BigDecimal[] allDatum : allData) {
            for (BigDecimal bigDecimal : allDatum) {
                System.out.print(bigDecimal.setScale(2, RoundingMode.HALF_UP) + " ");
            }
            System.out.println();
        }
    }

    void display4DData(SeekableByteChannel fileChannel, HdfDataFile hdfDataFile, HdfDataset dataSet) throws IOException {
        TypedDataSource<Integer> dataSource = new TypedDataSource<>(fileChannel, hdfDataFile, dataSet, Integer.class);
        int[] shape = dataSource.getShape();
        Integer[][][] step0 = (Integer[][][]) FlattenedArrayUtils.sliceStream(
                dataSource.streamFlattened(), dataSource.getShape(),
                new int[][]{{}, {}, {}, {0}}, Integer.class);
        System.out.println("Step 0:");
        for (int x = 0; x < shape[0]; x++) {
            for (int y = 0; y < shape[1]; y++) {
                for (int z = 0; z < shape[2]; z++) {
                    Integer value = step0[x][y][z];
                    System.out.printf("(%d %d %d) %s%n", x, y, z, value);
                }
            }
        }
    }
}

Output (example):

Scalar dataset name = /scalar
Scalar readAll stats = IntSummaryStatistics{count=1, sum=42, min=42, average=42.000000, max=42}
Scalar streaming list = [42]
Scalar parallelStreaming list = [42]
Matrix readAll() =
1.00 2.00
3.00 4.00
Step 0:
(0 0 0) 0
(0 0 1) 1
...

Note: Use your own file paths for scalar.h5, weatherdata.h5, and tictactoe_4d_state.h5.

Example 6: Read Separate Datatypes

This example reads datasets for various datatypes (fixed-point, float, time, string, etc.) from all_types_separate.h5, using native HDF5 types, Java types, and a custom compound converter.

package org.hdf5javalib.examples.read;

import org.hdf5javalib.dataclass.*;
import org.hdf5javalib.datatype.CompoundDatatype;
import org.hdf5javalib.hdfjava.HdfDataset;
import org.hdf5javalib.hdfjava.HdfFileReader;
import org.hdf5javalib.utils.HdfDisplayUtils;

import java.io.FileInputStream;
import java.math.BigDecimal;
import java.math.BigInteger;
import java.nio.channels.FileChannel;
import java.util.BitSet;
import java.util.Map;
import java.util.Objects;
import java.util.stream.Collectors;

public class SeparateTypesRead {
    private static final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(SeparateTypesRead.class);

    public static void main(String[] args) throws Exception {
        new SeparateTypesRead().run();
    }

    private void run() throws Exception {
        String filePath = Objects.requireNonNull(this.getClass().getResource("/all_types_separate.h5")).getFile();
        try (FileInputStream fis = new FileInputStream(filePath)) {
            FileChannel channel = fis.getChannel();
            HdfFileReader reader = new HdfFileReader(channel).readFile();
            try (HdfDataset dataSet = reader.getDataset("/fixed_point").orElseThrow()) {
                HdfDisplayUtils.displayScalarData(channel, dataSet, HdfFixedPoint.class, reader);
                HdfDisplayUtils.displayScalarData(channel, dataSet, Integer.class, reader);
                HdfDisplayUtils.displayScalarData(channel, dataSet, Long.class, reader);
                HdfDisplayUtils.displayScalarData(channel, dataSet, BigInteger.class, reader);
                HdfDisplayUtils.displayScalarData(channel, dataSet, BigDecimal.class, reader);
                HdfDisplayUtils.displayScalarData(channel, dataSet, String.class, reader);
            }
            try (HdfDataset dataSet = reader.getDataset("/float").orElseThrow()) {
                HdfDisplayUtils.displayScalarData(channel, dataSet, HdfFloatPoint.class, reader);
                HdfDisplayUtils.displayScalarData(channel, dataSet, Float.class, reader);
                HdfDisplayUtils.displayScalarData(channel, dataSet, Double.class, reader);
                HdfDisplayUtils.displayScalarData(channel, dataSet, String.class, reader);
            }
            try (HdfDataset dataSet = reader.getDataset("/compound").orElseThrow()) {
                HdfDisplayUtils.displayScalarData(channel, dataSet, HdfCompound.class, reader);
                CompoundDatatype.addConverter(CustomCompound.class, (bytes, compoundDataType) -> {
                    Map<String, HdfCompoundMember> nameToMember = compoundDataType.getInstance(HdfCompound.class, bytes)
                            .getMembers()
                            .stream()
                            .collect(Collectors.toMap(m -> m.getDatatype().getName(), m -> m));
                    return CustomCompound.builder()
                            .name("Name")
                            .someShort(nameToMember.get("a").getInstance(Short.class))
                            .someDouble(nameToMember.get("b").getInstance(Double.class))
                            .build();
                });
                HdfDisplayUtils.displayScalarData(channel, dataSet, CustomCompound.class, reader);
            }
            log.info("Superblock: {}", reader.getSuperblock());
        }
    }

    public static class Compound {
        private Short a;
        private Double b;

        public Short getA() { return a; }
        public void setA(Short a) { this.a = a; }
        public Double getB() { return b; }
        public void setB(Double b) { this.b = b; }
    }

    public static class CustomCompound {
        private String name;
        private Short someShort;
        private Double someDouble;

        private CustomCompound() {}

        public static Builder builder() { return new Builder(); }

        public String getName() { return name; }
        public void setName(String name) { this.name = name; }
        public Short getSomeShort() { return someShort; }
        public void setSomeShort(Short someShort) { this.someShort = someShort; }
        public Double getSomeDouble() { return someDouble; }
        public void setSomeDouble(Double someDouble) { this.someDouble = someDouble; }

        public static class Builder {
            private final CustomCompound instance = new CustomCompound();

            public Builder name(String name) {
                instance.setName(name);
                return this;
            }

            public Builder someShort(Short someShort) {
                instance.setSomeShort(someShort);
                return this;
            }

            public Builder someDouble(Double someDouble) {
                instance.setSomeDouble(someDouble);
                return this;
            }

            public CustomCompound build() { return instance; }
        }
    }
}

Output (example):

HdfFixedPoint: 42
Integer: 42
Long: 42
BigInteger: 42
BigDecimal: 42
String: 42
HdfFloatPoint: 3.14
Float: 3.14
Double: 3.14
String: 3.14
HdfCompound: [a=10, b=20.5]
CustomCompound: Name=Name, someShort=10, someDouble=20.5

Note: Only a subset of datatypes is shown for brevity.

Example 7: Read Variable-Length Datatypes

This example reads variable-length datasets from vlen_types_example.h5, displaying them as HdfVariableLength, String, and Object.

package org.hdf5javalib.examples.read;

import org.hdf5javalib.dataclass.HdfVariableLength;
import org.hdf5javalib.hdfjava.HdfDataset;
import org.hdf5javalib.hdfjava.HdfFileReader;
import org.hdf5javalib.utils.HdfDisplayUtils;

import java.io.FileInputStream;
import java.nio.channels.FileChannel;
import java.util.Objects;

public class VLenTypesRead {
    private static final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(VLenTypesRead.class);

    public static void main(String[] args) throws Exception {
        new VLenTypesRead().run();
    }

    private void run() throws Exception {
        String filePath = Objects.requireNonNull(this.getClass().getResource("/vlen_types_example.h5")).getFile();
        try (FileInputStream fis = new FileInputStream(filePath)) {
            FileChannel channel = fis.getChannel();
            HdfFileReader reader = new HdfFileReader(channel).readFile();
            for (HdfDataset dataSet : reader.getDatasets()) {
                try (HdfDataset ds = dataSet) {
                    System.out.println("Dataset name: " + ds.getObjectName());
                    HdfDisplayUtils.displayScalarData(channel, ds, HdfVariableLength.class, reader);
                    HdfDisplayUtils.displayScalarData(channel, ds, String.class, reader);
                    HdfDisplayUtils.displayScalarData(channel, ds, Object.class, reader);
                }
            }
            log.info("Superblock: {}", reader.getSuperblock());
        }
    }
}

Output (example):

Dataset name: /vlen_dataset
HdfVariableLength: [1, 2, 3, 4]
String: [1, 2, 3, 4]
Object: [1, 2, 3, 4]

Note: Use FileInputStream or Files.newByteChannel for your own files.

Running Examples

  1. Using Maven Central:
    • Add the dependency to your pom.xml.
    • Place your HDF5 files in src/main/resources or a file system path.
    • Compile and run:
      mvn compile
      java -cp target/classes org.hdf5javalib.examples.read.CompoundRead
      
  2. From Source:
    • Example HDF5 files are in src/test/resources.
    • Run tests to verify:
      mvn test
      
    • Run examples:
      mvn compile
      java -cp target/classes org.hdf5javalib.examples.read.CompoundRead
      
  3. Tips:
    • Use Java 17+.
    • Ensure HDF5 files are accessible via the file system or classpath.
    • For custom HDF5 files, use Files.newByteChannel or FileChannel.open to create a SeekableByteChannel.

Feedback

Help improve Hdf5JavaLib by reporting issues at GitHub Issues. Please include:

Visit https://www.hdf5javalib.org for updates and resources.