Quick and Easy NER REST API with GLiNER and AWS Sagemaker

18/10/2024

In this post, we take the Generalist and Lightweight Model for Named Entity Recognition (GLiNER) for a spin. It’s an awesome Named Entity Recognition Model that allows you to define your custom entity labels. It’s also pretty resource-efficient and apparently holds its own compared to LLMs in zero-shot scenarios .

Here’s a basic example of how it works:

from gliner import GLiNER

from pprint import pprint

model = GLiNER.from_pretrained("urchade/gliner_mediumv2.1")

text = """
Named Entity Recognition plays a crucial role in
various real-world applications, such as constructing knowledge graphs. Traditional NER models are
limited to a predefined set of entity types. Expanding the number of entity types can be beneficial for
many applications but may involve labeling additional datasets. The emergence of Large Language
Models, like GPT-3 (Brown et al., 2020), has introduced a new era for open-type NER by enabling
the identification of any types of entity types only
by natural language instruction. This shift signifies a significant departure from the inflexibility
observed in traditional models. However, powerful
LLMs typically consist of billions of parameters
and thus require substantial computing resources.
Although it is possible to access some LLMs via
APIs (OpenAI, 2023), using them at scale can incur
high costs.
"""

labels = ["Technology", "Person", "Event", "Person", "Organization"]

entities = model.predict_entities(text, labels, threshold=0.5)

pprint(entities)

This will give us:

[
  {
    "end": 25,
    "label": "Technology",
    "score": 0.5098580121994019,
    "start": 1,
    "text": "Named Entity Recognition"
  },
  {
    "end": 355,
    "label": "Technology",
    "score": 0.8522480726242065,
    "start": 334,
    "text": "Large Language\nModels"
  },
  {
    "end": 367,
    "label": "Technology",
    "score": 0.9114717245101929,
    "start": 362,
    "text": "GPT-3"
  },
  {
    "end": 381,
    "label": "Person",
    "score": 0.9619215130805969,
    "start": 369,
    "text": "Brown et al."
  },
  {
    "end": 387,
    "label": "Event",
    "score": 0.5342058539390564,
    "start": 383,
    "text": "2020"
  },
  {
    "end": 432,
    "label": "Technology",
    "score": 0.5196287035942078,
    "start": 419,
    "text": "open-type NER"
  },
  {
    "end": 808,
    "label": "Organization",
    "score": 0.9727300405502319,
    "start": 802,
    "text": "OpenAI"
  },
  {
    "end": 814,
    "label": "Event",
    "score": 0.5546731948852539,
    "start": 810,
    "text": "2023"
  }
]

Pretty impressive for a simple pip install and a few lines of code.

Turning this into a RESTful API is also straightforward. First, we need a model.py file:

# model.py
from gliner import GLiNER

import warnings

warnings.filterwarnings("ignore")

model = GLiNER.from_pretrained(
    "./",
    local_files_only=True,
    max_length=512,
)

default_labels = [
    "Brand",
    "Product",
    "Organization",
    "Person",
    "Event",
    "Misc",
    "Location",
    "Service",
    "Industry",
    "Technology",
    "Hobby",
    "Fashion",
    "Automobile",
    "Food",
    "Entertainment",
]

We suppress some annoying warnings, set up the model and some default labels.

My primary use case involves running NER on Google keywords, hence this particular set of labels, but this doesn’t really matter that much - you give the model some text and some labels and it will do its thing.

NOTE: The code above assumes the model is stored locally (this will be important when creating the Docker image). You can download it from here

Next, we set up the API. I will use FastAPI, but you could easily swap it for Flask or some other framework.

# app.py

import os
from typing import List, Optional

from fastapi import FastAPI, Request, Response

from model import default_labels, model

os.environ["HF_HOME"] = "/tmp/huggingface"
os.environ["TRANSFORMERS_CACHE"] = "/tmp/huggingface"

app = FastAPI()

@app.get("/ping")
def ping():
    status = 200 if model else 404
    return Response(status_code=status)


@app.post("/invocations")
async def invocations(request: Request):
    input_ = await request.json()
    labels = input_.get("labels", default_labels)

    if input_.get("keywords"):
        predictions = model.batch_predict_entities(
            input_.get("keywords"), labels, threshold=0.5, flat_ner=True
        )
        if predictions:
            return predictions

Note the endpoint names. We need to have /ping and /invocations in order for our API to play nice with AWS Sagemaker. Read more here

Also, I use model.batch_predict to be able to pass list of strings directly instead of looping which speeds things up significantly. This is completely optional and will vary depending on your use case.

On to the last and, arguably, nastiest bit: creating the Docker image.

Here’s a minimal Dockerfile for AWS Sagemaker:

FROM python:3.12.1-slim

ENV PYTHONDONTWRITEBYTECODE=1

ENV PYTHONUNBUFFERED=1

ENV TRANSFORMERS_CACHE=/tmp/huggingface

ENV HF_HOME=/tmp/huggingface

WORKDIR /opt/program

EXPOSE 8080

COPY . .

RUN chmod +x serve

ENV PATH="/opt/program:${PATH}"

ENTRYPOINT [ "serve" ]

Build the image and push it to ECR. Make sure it’s an x86_64-based image, arm-based images don’t work with Sagemaker as of this writing.

# build the image
 docker build -t gliner .

# tag the image
docker tag gliner <your-ecr-repo>.dkr.ecr.<your-ecr-repo-region>amazonaws.com/<your-image-name>:<your-image-tag>

# push the image to ECR
docker push <your-ecr-repo>.dkr.ecr.<your-ecr-repo-region>amazonaws.com/<your-image-name>:<your-image-tag>

Create the policy file (sagemaker-policy.json):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "sagemaker.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create the IAM role:

aws iam create-role --role-name SageMakerExecutionRole --assume-role-policy-document file://sagemaker-policy.json

Attach the policy:

aws iam attach-role-policy --role-name SageMakerExecutionRole --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess

Create a Sagemaker model:

aws sagemaker create-model --model-name gliner-api \
--execution-role-arn arn:aws:iam::<your-role-from-above>:role/SageMakerExecutionRole \
--primary-container Image=<your-ecr-repo>.dkr.ecr.<your-ecr-repo-region>amazonaws.com/<your-image-name>:<your-image-tag>

Next, we create an endpoint configuration:

aws sagemaker create-endpoint-config --endpoint-config-name gliner-api-endpoint-config \
--production-variants VariantName=variant1,ModelName=gliner-api,InitialInstanceCount=1,InstanceType=ml.m5.xlarge

And finally, the endpoint itself:

aws sagemaker  create-endpoint \
--endpoint-name gliner-api-endpoint \
--endpoint-config-name gliner-api-endpoint-config

Once it’s in service we can test it with something like this:

pip install boto3 requests_aws4auth

import boto3
import requests
from requests_aws4auth import AWS4Auth

session = boto3.Session()
credentials = session.get_credentials()
region = "your-region"  # e.g., 'us-east-1'


awsauth = AWS4Auth(
    credentials.access_key,
    credentials.secret_key,
    region,
    "sagemaker",
    session_token=credentials.token,
)


endpoint_name = "gliner-endpoint"
url = f"https://runtime.sagemaker.{region}.amazonaws.com/endpoints/{endpoint_name}/invocations"


payload = {
    "keywords": [
        "nike v adidas",
        "nike j guard sizing",
        "nike fc",
        "who stocks nike",
        "nike i watch",
        "nike d'tack 60",
        "nike gt cut academy",
        "nike by you",
        "nike n 354 squash type",
        "nike shox nz",
        "nike with white socks",
        "nike for tennis",
    ],
    "labels": ["sports", "entertainment", "product", "technology", "event"],
}

headers = {"Content-Type": "application/json"}
response = requests.post(url, auth=awsauth, json=payload, headers=headers)

print(response.json())

And we get back the result:

200
[[{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.7612437605857849}, {'start': 7, 'end': 13, 'text': 'adidas', 'label': 'product', 'score': 0.799613356590271}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.673353374004364}], [], [{'start': 11, 'end': 15, 'text': 'nike', 'label': 'product', 'score': 0.9046514630317688}], [{'start': 0, 'end': 12, 'text': 'nike i watch', 'label': 'product', 'score': 0.5342481732368469}], [{'start': 0, 'end': 14, 'text': "nike d'tack 60", 'label': 'product', 'score': 0.5939035415649414}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.7377316355705261}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.7738112807273865}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.7529258131980896}, {'start': 11, 'end': 22, 'text': 'squash type', 'label': 'sports', 'score': 0.6611686944961548}], [{'start': 0, 'end': 9, 'text': 'nike shox', 'label': 'product', 'score': 0.6396805047988892}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.8681221604347229}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.8459076285362244}, {'start': 9, 'end': 15, 'text': 'tennis', 'label': 'sports', 'score': 0.700456440448761}]]

To avoid unpleasant billing surprises we can delete the above resources like so:

# delete the endpoint
aws sagemaker delete-endpoint \
--endpoint-name gliner-api-endpoint

# delete the endpoint config
aws sagemaker delete-endpoint-config \
--endpoint-config-name gliner-api-endpoint-config

# delete the model
aws sagemaker delete-model \
--model-name gliner-api