Python tutorial
This tutorial provides a full worked example of using ome-zarr-models in Python
import os
import tempfile
import matplotlib.pyplot as plt
import numpy as np
import zarr
import zarr.storage
from pydantic_zarr.v3 import AnyArraySpec, ArraySpec, NamedConfig
from rich.pretty import pprint
from ome_zarr_models import open_ome_zarr
from ome_zarr_models.v05.axes import Axis
from ome_zarr_models.v05.image import Image
Loading datasets¶
OME-Zarr datasets are Zarr groups with specific metadata. To open an OME-Zarr dataset, we first open the Zarr group.
zarr_group = zarr.open_group(
"https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0066/ExpD_chicken_embryo_MIP.ome.zarr",
mode="r",
)
If you're not sure what type or OME-Zarr version of data you have, you can
use open_ome_zarr() to automatically 'guess' the correct group:
ome_zarr_group = open_ome_zarr(zarr_group)
print(f"Group class: {type(ome_zarr_group)}")
print(f"OME-Zarr version: {ome_zarr_group.ome_zarr_version}")
Group class: <class 'ome_zarr_models.v05.image.Image'> OME-Zarr version: 0.5
If you already know the data type you're loading, it's better to load directly from that class (see the API reference for a list of classes) This will validate the metadata:
ome_zarr_image = Image.from_zarr(zarr_group)
No errors, which means the metadata is valid 🎉
Accessing metadata¶
To access the OME-Zarr metadata, use the .ome_attributes property:
metadata = ome_zarr_image.ome_attributes
pprint(metadata)
ImageAttrs( │ version='0.5', │ multiscales=[ │ │ Multiscale( │ │ │ axes=[ │ │ │ │ Axis(name='y', type='space', unit='micrometer'), │ │ │ │ Axis(name='x', type='space', unit='micrometer') │ │ │ ], │ │ │ datasets=( │ │ │ │ Dataset(path='0', coordinateTransformations=(VectorScale(type='scale', scale=[1.6, 1.6]),)), │ │ │ │ Dataset(path='1', coordinateTransformations=(VectorScale(type='scale', scale=[3.2, 3.2]),)), │ │ │ │ Dataset(path='2', coordinateTransformations=(VectorScale(type='scale', scale=[6.4, 6.4]),)), │ │ │ │ Dataset(path='3', coordinateTransformations=(VectorScale(type='scale', scale=[12.8, 12.8]),)), │ │ │ │ Dataset(path='4', coordinateTransformations=(VectorScale(type='scale', scale=[25.6, 25.6]),)), │ │ │ │ Dataset(path='5', coordinateTransformations=(VectorScale(type='scale', scale=[51.2, 51.2]),)), │ │ │ │ Dataset(path='6', coordinateTransformations=(VectorScale(type='scale', scale=[102.4, 102.4]),)), │ │ │ │ Dataset(path='7', coordinateTransformations=(VectorScale(type='scale', scale=[204.8, 204.8]),)) │ │ │ ), │ │ │ coordinateTransformations=None, │ │ │ metadata=None, │ │ │ name='/', │ │ │ type=None │ │ ) │ ], │ _creator={'name': 'ome2024-ngff-challenge', 'version': '1.0.2', 'notes': None}, │ omero={ │ │ 'channels': [ │ │ │ { │ │ │ │ 'active': True, │ │ │ │ 'coefficient': 1.0, │ │ │ │ 'color': 'FFFFFF', │ │ │ │ 'family': 'linear', │ │ │ │ 'inverted': False, │ │ │ │ 'label': 'Cy3', │ │ │ │ 'window': {'end': 55.0, 'max': 255.0, 'min': 0.0, 'start': 0.0} │ │ │ } │ │ ], │ │ 'id': 1, │ │ 'rdefs': {'defaultT': 0, 'defaultZ': 0, 'model': 'greyscale'} │ } )
And as an example of getting more specific metadata, lets get the metadata for all the datasets in this multiscales:
pprint(ome_zarr_image.datasets[0])
( │ Dataset(path='0', coordinateTransformations=(VectorScale(type='scale', scale=[1.6, 1.6]),)), │ Dataset(path='1', coordinateTransformations=(VectorScale(type='scale', scale=[3.2, 3.2]),)), │ Dataset(path='2', coordinateTransformations=(VectorScale(type='scale', scale=[6.4, 6.4]),)), │ Dataset(path='3', coordinateTransformations=(VectorScale(type='scale', scale=[12.8, 12.8]),)), │ Dataset(path='4', coordinateTransformations=(VectorScale(type='scale', scale=[25.6, 25.6]),)), │ Dataset(path='5', coordinateTransformations=(VectorScale(type='scale', scale=[51.2, 51.2]),)), │ Dataset(path='6', coordinateTransformations=(VectorScale(type='scale', scale=[102.4, 102.4]),)), │ Dataset(path='7', coordinateTransformations=(VectorScale(type='scale', scale=[204.8, 204.8]),)) )
Accessing data¶
Although these models do not handle reading or writing data, they do give access to
the Zarr arrays using the zarr-python library.
For example, to get the highest resolution image:
zarr_arr = zarr_group[ome_zarr_image.datasets[0][3].path]
pprint(zarr_arr)
<Array <FsspecStore(HTTPFileSystem, https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0066/ExpD_chicken_embryo_MIP.ome.zarr)>/3 shape=(1122, 813) dtype=uint8>
To finish off this section on accessing data, lets plot this image:
plt.imshow(zarr_arr, cmap="gray")
<matplotlib.image.AxesImage at 0x7dfe8170d010>
Creating new datasets¶
To create new OME-Zarr datasets, the .new() method on the OME-Zarr groups
can be used. This creates all the Zarr groups, Zarr arrays within those groups,
and related metadata, but does not write any data to the Zarr arrays.
As an example we'll create an OME-Zarr image with two arrays, one at the original resolution and one downsampled version.
First, we need to create ArraySpec objects, which tell ome-zarr-models
what the structure of the data arrays will be.
array_specs: list[AnyArraySpec] = [
ArraySpec(
shape=(100, 100),
data_type=np.uint16,
chunk_grid=NamedConfig(
name="regular",
configuration={"chunk_shape": [32, 32]},
),
chunk_key_encoding=NamedConfig(
name="default", configuration={"separator": "/"}
),
fill_value=0,
codecs=[NamedConfig(name="bytes")],
dimension_names=["y", "x"],
attributes={},
),
ArraySpec(
shape=(100, 100),
data_type=np.uint16,
chunk_grid=NamedConfig(
name="regular",
configuration={"chunk_shape": [32, 32]},
),
chunk_key_encoding=NamedConfig(
name="default", configuration={"separator": "/"}
),
fill_value=0,
codecs=[NamedConfig(name="bytes")],
dimension_names=["y", "x"],
attributes={},
),
]
Next, we'll set some metadata values
pixel_size = (6, 4)
pixel_unit = "um"
Finally, we can use these variables to create a new OME-Zarr image group.
ome_zarr_image = Image.new(
array_specs=array_specs,
paths=["level0", "level1"],
axes=[
Axis(name="y", type="space", unit=pixel_unit),
Axis(name="x", type="space", unit=pixel_unit),
],
scales=[[p * 1 for p in pixel_size], [p * 2 for p in pixel_size]],
translations=[[0, 0], [p * 0.5 for p in pixel_size]],
)
pprint(ome_zarr_image)
Image( │ zarr_format=3, │ node_type='group', │ attributes=BaseZarrAttrs[ImageAttrs]( │ │ ome=ImageAttrs( │ │ │ version='0.5', │ │ │ multiscales=[ │ │ │ │ Multiscale( │ │ │ │ │ axes=(Axis(name='y', type='space', unit='um'), Axis(name='x', type='space', unit='um')), │ │ │ │ │ datasets=( │ │ │ │ │ │ Dataset( │ │ │ │ │ │ │ path='level0', │ │ │ │ │ │ │ coordinateTransformations=( │ │ │ │ │ │ │ │ VectorScale(type='scale', scale=[6.0, 4.0]), │ │ │ │ │ │ │ │ VectorTranslation(type='translation', translation=[0.0, 0.0]) │ │ │ │ │ │ │ ) │ │ │ │ │ │ ), │ │ │ │ │ │ Dataset( │ │ │ │ │ │ │ path='level1', │ │ │ │ │ │ │ coordinateTransformations=( │ │ │ │ │ │ │ │ VectorScale(type='scale', scale=[12.0, 8.0]), │ │ │ │ │ │ │ │ VectorTranslation(type='translation', translation=[3.0, 2.0]) │ │ │ │ │ │ │ ) │ │ │ │ │ │ ) │ │ │ │ │ ), │ │ │ │ │ coordinateTransformations=None, │ │ │ │ │ metadata=None, │ │ │ │ │ name=None, │ │ │ │ │ type=None │ │ │ │ ) │ │ │ ] │ │ ) │ ), │ members={ │ │ 'level0': ArraySpec( │ │ │ zarr_format=3, │ │ │ node_type='array', │ │ │ attributes={}, │ │ │ shape=(100, 100), │ │ │ data_type='uint16', │ │ │ chunk_grid={'name': 'regular', 'configuration': {'chunk_shape': (32, 32)}}, │ │ │ chunk_key_encoding={'name': 'default', 'configuration': {'separator': '/'}}, │ │ │ fill_value=0, │ │ │ codecs=({'name': 'bytes'},), │ │ │ storage_transformers=(), │ │ │ dimension_names=('y', 'x') │ │ ), │ │ 'level1': ArraySpec( │ │ │ zarr_format=3, │ │ │ node_type='array', │ │ │ attributes={}, │ │ │ shape=(100, 100), │ │ │ data_type='uint16', │ │ │ chunk_grid={'name': 'regular', 'configuration': {'chunk_shape': (32, 32)}}, │ │ │ chunk_key_encoding={'name': 'default', 'configuration': {'separator': '/'}}, │ │ │ fill_value=0, │ │ │ codecs=({'name': 'bytes'},), │ │ │ storage_transformers=(), │ │ │ dimension_names=('y', 'x') │ │ ) │ } )
It's also possible to create array metadata from existing arrays.
For numpy arrays:
arr0 = np.zeros(shape=(100, 100), dtype=np.uint16)
arr1 = np.zeros(shape=(50, 50), dtype=np.uint16)
array_specs = [ArraySpec.from_array(arr0), ArraySpec.from_array(arr1)]
or for Zarr arrays:
arr_zarr0 = zarr.zeros(shape=(100, 100), dtype=np.uint16, zarr_format=2)
arr_zarr1 = zarr.zeros(shape=(50, 50), dtype=np.uint16, zarr_format=2)
array_specs = [ArraySpec.from_array(arr_zarr0), ArraySpec.from_array(arr_zarr1)]
Saving datasets¶
At this point the ome_zarr_image object is a representation of the
OME-Zarr group in the memory of your computer.
To save a new dataset the .to_zarr(store=...) method can be used,
which will put all the OME-Zarr group metadata into a Zarr store.
In this tutorial we'll use a temporary directory to save the Zarr group to, and then list the directory to show that it has been saved.
with tempfile.TemporaryDirectory() as fp:
store = zarr.storage.LocalStore(fp)
ome_zarr_image.to_zarr(store=store, path="/")
print(os.listdir(fp))
['level0', 'zarr.json', 'level1']