Hiera Encoder from Meta's SAM2.1 (Segment Anything Model)

Meta's SAM2 (Segment Anything Model v2) demonstrates state-of-the-art video segmentation capabilities. A core component enabling this is the Hiera module, which, through supervised training on object segmentation, has learned a strong understanding of hierarchical visual features.

While Meta has released the full SAM2 models and their weights, these releases are based on PyTorch code and not integrated with Hugging Face Transformers or common training frameworks such as Trainer, DeepSpeed, etc.

This repository extracts the Hiera module from SAM2 and wraps it with Hugging Face compatibility, including integration with PretrainedConfig, PreTrainedModel, etc., allowing seamless use in Hugging Face-style training and inference workflows.

Model Details

Original Model: facebook/sam2.1-hiera-base-plus
This Model: nkkbr/hiera-base-plus-in-sam2.1

This model exposes only the Hiera encoder extracted from SAM2.1, wrapped for Hugging Face usage.

Installation

You first need to install Meta’s original SAM2 code:

git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .

Usage

from hiera_encoder import HieraVisionModel

# Load the Hiera module from Hugging Face
model = HieraVisionModel.from_pretrained("nkkbr/hiera-base-plus-in-sam2.1")

# Get the raw Hiera model
model = model.hiera

# Print model parameters
for name, param in model.named_parameters():
    print(f"{name:50} {param.shape}")

Weight Consistency Check

To verify that the weights are identical to those in Meta's original SAM2.1 Hiera module:

import torch
from sam2.sam2_image_predictor import SAM2ImagePredictor

# Load SAM2.1 predictor from Meta's official release
predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2.1-hiera-base-plus")
hiera_model_in_predictor = predictor.model.image_encoder.trunk

# Compare weights
for name, param in model.named_parameters():
    if not torch.equal(param, hiera_model_in_predictor.state_dict()[name]):
        print(f"The parameter {name} has different weights in the two models.")

print("Comparison complete!")

License

Please refer to the SAM2 repository for license and usage terms.

Downloads last month: -

Safetensors

Model size

68.7M params

Tensor type

F32

Collection including nkkbr/hiera-base-plus-in-sam2.1

hiera-in-sam2.1

Collection

HF-compatible adaptation of Hiera (from SAM2.1), ready for use as a visual encoder in multimodal language models with Trainer and DeepSpeed support • 2 items • Updated Apr 11, 2025