Building a Terraform Provider for HuggingFace Endpoints

A deep dive into Terraform's plugin architecture and how to extend it for any API.

When our team adopted HuggingFace Inference Endpoints for ML model serving, we hit a familiar problem: everything else in our stack was managed with Terraform, but HuggingFace didn't have a provider. This meant manual deployments, configuration drift, and no version control for our endpoint configs.

So I built one. What started as a pragmatic solution turned into a deep exploration of how Terraform actually works—and resulted in an open-source provider that now manages all 29 of our production endpoints on GCP.

29 Endpoints Managed
GCP Cloud Provider
0 Manual Deployments

What Terraform Actually Does

At its core, Terraform is a state reconciliation engine. You describe the infrastructure you want (desired state), Terraform compares that to what exists (actual state), and figures out what changes are needed to make reality match your description.

The workflow has three phases: init downloads providers and modules, plan computes a diff between desired and actual state, and apply executes the changes. A state file tracks what Terraform has created so it can manage those resources going forward.

terraform init terraform plan terraform apply │ │ │ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Download │ │ Compare │ │ Execute │ │ Providers │ ────▶ │ Desired │ ────▶ │ Changes │ │ & Modules │ │ vs Actual │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ ▼ ▼ Execution Plan State Updated

Terraform's three-phase workflow

The Plugin Architecture

Here's the key insight: Terraform itself doesn't know how to create a GCP instance or an AWS bucket or a HuggingFace endpoint. All that knowledge lives in providers—separate binaries that communicate with Terraform Core through gRPC.

┌────────────────────────────────────────────────────────────────┐ │ TERRAFORM CORE │ │ │ │ .tf files ───▶ State Management ───▶ Execution Engine │ └────────────────────────────────┬───────────────────────────────┘ │ gRPC Protocol │ ┌────────────────────────────────┴───────────────────────────────┐ │ PROVIDER PLUGIN │ │ │ │ Schema Definition ── CRUD Operations ── State Mapping │ │ │ │ terraform-provider-huggingface │ └────────────────────────────────┬───────────────────────────────┘ │ HTTP/REST API │ ▼ HuggingFace API

Providers are separate binaries that bridge Terraform and external APIs

This architecture is elegant because it's infinitely extensible. Anyone can write a provider for any service. Providers handle three responsibilities: defining a schema (what resources exist and their attributes), implementing CRUD operations (create, read, update, delete), and mapping between Terraform's state model and the external API.

Building the Provider

I didn't start from scratch. HashiCorp provides a scaffold repository that gives you a working provider skeleton with the correct project structure and build configuration. This was invaluable for getting started quickly.

Before touching any Terraform code, I created a standalone Go client for the HuggingFace API. This was a deliberate choice—a clean HTTP client is useful beyond Terraform, and I thought others might want to use it in their own projects. Separating concerns also made testing much easier.

The provider itself uses HashiCorp's terraform-plugin-framework, which handles the gRPC protocol, state serialization, and plan diffing. Your job is to define resources and implement what happens during each lifecycle operation.

Defining Resources

Each resource in a Terraform provider needs two things: a schema that describes its shape, and methods that implement its lifecycle. The schema tells Terraform what attributes exist, which are required vs optional, and which are computed by the API.

For a HuggingFace endpoint, the schema includes things like the endpoint name, the model repository, compute configuration (instance type, accelerator, scaling settings), and cloud region. Some fields like status and url are computed—they're returned by the API after creation, not specified by the user.

RESOURCE SCHEMA ┌─────────────────────────────────────────────────────────────────┐ │ huggingface_endpoint │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Required User must provide │ │ ├── name string │ │ ├── model.repository string │ │ └── compute.instance_type string │ │ │ │ Optional Defaults if not set │ │ ├── type string (default: "private") │ │ ├── cloud.region string (default: "us-east1") │ │ └── compute.scaling object │ │ │ │ Computed Returned by API │ │ ├── id string │ │ ├── status string │ │ └── url string │ │ │ └─────────────────────────────────────────────────────────────────┘

Schema defines the shape of a resource

Implementing the Lifecycle

The framework requires you to implement four methods for each resource: Create, Read, Update, and Delete. These methods are where the actual API calls happen.

Create receives the planned configuration, calls the HuggingFace API to provision the endpoint, and stores the result (including computed fields like the endpoint URL) in Terraform's state.

Read is called before every plan to sync Terraform's state with reality. It fetches the current endpoint configuration from the API and updates the state. If the endpoint was deleted outside of Terraform (through the UI, for instance), Read removes it from state so Terraform knows to recreate it.

Update handles configuration changes. It compares the planned state to the current state, figures out what changed, and makes the appropriate API calls. Some changes might require replacing the resource entirely—the schema can declare which attributes force replacement.

Delete is straightforward: call the API to tear down the endpoint and remove it from state.

terraform apply │ ▼ ┌────────────────┐ │ Read current │ ◀─────── GET /endpoint/{name} │ state │ └───────┬────────┘ │ ▼ ┌────────────────┐ │ Compare plan │ │ vs state │ └───────┬────────┘ │ ┌────────┼────────┬─────────────┐ ▼ ▼ ▼ ▼ No change Create Update Delete │ │ │ │ │ ▼ ▼ ▼ │ POST PUT DELETE │ │ │ │ └────────┴────────┴─────────────┘ │ ▼ State updated

Each lifecycle method maps to API operations

The trickiest part was Terraform's type system. Fields aren't just "set or not"—they can be null (explicitly unset), unknown (will be computed after apply), or have an actual value. Getting this wrong produces confusing plan output where Terraform shows changes that don't actually exist.

Publishing to the Registry

Getting the provider into the official Terraform Registry was surprisingly straightforward. The key tool is GoReleaser, which handles cross-compilation for all platforms and creates signed GitHub releases.

The process works like this: you set up a GitHub Action that triggers on version tags, GoReleaser builds binaries for Linux, macOS, and Windows (both AMD64 and ARM64), signs the checksums with your GPG key, and publishes a GitHub release. The Terraform Registry syncs automatically within minutes.

Once published, anyone can use the provider by adding it to their Terraform configuration and running terraform init:

terraform {
  required_providers {
    huggingface = {
      source  = "issamemari/huggingface"
      version = "~> 1.0"
    }
  }
}

provider "huggingface" {
  namespace = var.huggingface_namespace
  token     = var.huggingface_token
}

What I Learned

Building this provider taught me more about Terraform than years of using it. A few takeaways:

The best infrastructure investments make the boring stuff automatic so you can focus on interesting problems.

We've now eliminated an entire category of incidents—no more orphaned endpoints burning money, no more "works on my endpoint" debugging sessions, no more manual processes that only one person understands. Every endpoint change goes through code review like any other infrastructure change.

Both repositories are open source:

If you're managing HuggingFace endpoints and want to bring them into your Terraform workflow, give it a try.