Quick and Easy NER REST API with GLiNER and AWS Sagemaker
18/10/2024
In this post, we take the Generalist and Lightweight Model for Named Entity Recognition (GLiNER) for a spin. It’s an awesome Named Entity Recognition Model that allows you to define your custom entity labels. It’s also pretty resource-efficient and apparently holds its own compared to LLMs in zero-shot scenarios .
Here’s a basic example of how it works:
from gliner import GLiNER
from pprint import pprint
model = GLiNER.from_pretrained("urchade/gliner_mediumv2.1")
text = """
Named Entity Recognition plays a crucial role in
various real-world applications, such as constructing knowledge graphs. Traditional NER models are
limited to a predefined set of entity types. Expanding the number of entity types can be beneficial for
many applications but may involve labeling additional datasets. The emergence of Large Language
Models, like GPT-3 (Brown et al., 2020), has introduced a new era for open-type NER by enabling
the identification of any types of entity types only
by natural language instruction. This shift signifies a significant departure from the inflexibility
observed in traditional models. However, powerful
LLMs typically consist of billions of parameters
and thus require substantial computing resources.
Although it is possible to access some LLMs via
APIs (OpenAI, 2023), using them at scale can incur
high costs.
"""
labels = ["Technology", "Person", "Event", "Person", "Organization"]
entities = model.predict_entities(text, labels, threshold=0.5)
pprint(entities)
This will give us:
[
{
"end": 25,
"label": "Technology",
"score": 0.5098580121994019,
"start": 1,
"text": "Named Entity Recognition"
},
{
"end": 355,
"label": "Technology",
"score": 0.8522480726242065,
"start": 334,
"text": "Large Language\nModels"
},
{
"end": 367,
"label": "Technology",
"score": 0.9114717245101929,
"start": 362,
"text": "GPT-3"
},
{
"end": 381,
"label": "Person",
"score": 0.9619215130805969,
"start": 369,
"text": "Brown et al."
},
{
"end": 387,
"label": "Event",
"score": 0.5342058539390564,
"start": 383,
"text": "2020"
},
{
"end": 432,
"label": "Technology",
"score": 0.5196287035942078,
"start": 419,
"text": "open-type NER"
},
{
"end": 808,
"label": "Organization",
"score": 0.9727300405502319,
"start": 802,
"text": "OpenAI"
},
{
"end": 814,
"label": "Event",
"score": 0.5546731948852539,
"start": 810,
"text": "2023"
}
]
Pretty impressive for a simple pip install
and a few lines of code.
Turning this into a RESTful API is also straightforward. First, we need a model.py
file:
# model.py
from gliner import GLiNER
import warnings
warnings.filterwarnings("ignore")
model = GLiNER.from_pretrained(
"./",
local_files_only=True,
max_length=512,
)
default_labels = [
"Brand",
"Product",
"Organization",
"Person",
"Event",
"Misc",
"Location",
"Service",
"Industry",
"Technology",
"Hobby",
"Fashion",
"Automobile",
"Food",
"Entertainment",
]
We suppress some annoying warnings, set up the model and some default labels.
My primary use case involves running NER on Google keywords, hence this particular set of labels, but this doesn’t really matter that much - you give the model some text and some labels and it will do its thing.
NOTE: The code above assumes the model is stored locally (this will be important when creating the Docker image). You can download it from here
Next, we set up the API. I will use FastAPI, but you could easily swap it for Flask or some other framework.
# app.py
import os
from typing import List, Optional
from fastapi import FastAPI, Request, Response
from model import default_labels, model
os.environ["HF_HOME"] = "/tmp/huggingface"
os.environ["TRANSFORMERS_CACHE"] = "/tmp/huggingface"
app = FastAPI()
@app.get("/ping")
def ping():
status = 200 if model else 404
return Response(status_code=status)
@app.post("/invocations")
async def invocations(request: Request):
input_ = await request.json()
labels = input_.get("labels", default_labels)
if input_.get("keywords"):
predictions = model.batch_predict_entities(
input_.get("keywords"), labels, threshold=0.5, flat_ner=True
)
if predictions:
return predictions
Note the endpoint names. We need to have /ping and /invocations in order for our API to play nice with AWS Sagemaker. Read more here
Also, I use model.batch_predict
to be able to pass list of strings directly instead of looping which speeds things up significantly. This is completely optional and will vary depending on your use case.
On to the last and, arguably, nastiest bit: creating the Docker image.
Here’s a minimal Dockerfile
for AWS Sagemaker:
FROM python:3.12.1-slim
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV TRANSFORMERS_CACHE=/tmp/huggingface
ENV HF_HOME=/tmp/huggingface
WORKDIR /opt/program
EXPOSE 8080
COPY . .
RUN chmod +x serve
ENV PATH="/opt/program:${PATH}"
ENTRYPOINT [ "serve" ]
Build the image and push it to ECR. Make sure it’s an x86_64
-based image, arm
-based images don’t work with Sagemaker as of this writing.
# build the image
docker build -t gliner .
# tag the image
docker tag gliner <your-ecr-repo>.dkr.ecr.<your-ecr-repo-region>amazonaws.com/<your-image-name>:<your-image-tag>
# push the image to ECR
docker push <your-ecr-repo>.dkr.ecr.<your-ecr-repo-region>amazonaws.com/<your-image-name>:<your-image-tag>
Create the policy file (sagemaker-policy.json
):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "sagemaker.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Create the IAM role:
aws iam create-role --role-name SageMakerExecutionRole --assume-role-policy-document file://sagemaker-policy.json
Attach the policy:
aws iam attach-role-policy --role-name SageMakerExecutionRole --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess
Create a Sagemaker model:
aws sagemaker create-model --model-name gliner-api \
--execution-role-arn arn:aws:iam::<your-role-from-above>:role/SageMakerExecutionRole \
--primary-container Image=<your-ecr-repo>.dkr.ecr.<your-ecr-repo-region>amazonaws.com/<your-image-name>:<your-image-tag>
Next, we create an endpoint configuration:
aws sagemaker create-endpoint-config --endpoint-config-name gliner-api-endpoint-config \
--production-variants VariantName=variant1,ModelName=gliner-api,InitialInstanceCount=1,InstanceType=ml.m5.xlarge
And finally, the endpoint itself:
aws sagemaker create-endpoint \
--endpoint-name gliner-api-endpoint \
--endpoint-config-name gliner-api-endpoint-config
Once it’s in service we can test it with something like this:
pip install boto3 requests_aws4auth
import boto3
import requests
from requests_aws4auth import AWS4Auth
session = boto3.Session()
credentials = session.get_credentials()
region = "your-region" # e.g., 'us-east-1'
awsauth = AWS4Auth(
credentials.access_key,
credentials.secret_key,
region,
"sagemaker",
session_token=credentials.token,
)
endpoint_name = "gliner-endpoint"
url = f"https://runtime.sagemaker.{region}.amazonaws.com/endpoints/{endpoint_name}/invocations"
payload = {
"keywords": [
"nike v adidas",
"nike j guard sizing",
"nike fc",
"who stocks nike",
"nike i watch",
"nike d'tack 60",
"nike gt cut academy",
"nike by you",
"nike n 354 squash type",
"nike shox nz",
"nike with white socks",
"nike for tennis",
],
"labels": ["sports", "entertainment", "product", "technology", "event"],
}
headers = {"Content-Type": "application/json"}
response = requests.post(url, auth=awsauth, json=payload, headers=headers)
print(response.json())
And we get back the result:
200
[[{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.7612437605857849}, {'start': 7, 'end': 13, 'text': 'adidas', 'label': 'product', 'score': 0.799613356590271}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.673353374004364}], [], [{'start': 11, 'end': 15, 'text': 'nike', 'label': 'product', 'score': 0.9046514630317688}], [{'start': 0, 'end': 12, 'text': 'nike i watch', 'label': 'product', 'score': 0.5342481732368469}], [{'start': 0, 'end': 14, 'text': "nike d'tack 60", 'label': 'product', 'score': 0.5939035415649414}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.7377316355705261}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.7738112807273865}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.7529258131980896}, {'start': 11, 'end': 22, 'text': 'squash type', 'label': 'sports', 'score': 0.6611686944961548}], [{'start': 0, 'end': 9, 'text': 'nike shox', 'label': 'product', 'score': 0.6396805047988892}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.8681221604347229}], [{'start': 0, 'end': 4, 'text': 'nike', 'label': 'product', 'score': 0.8459076285362244}, {'start': 9, 'end': 15, 'text': 'tennis', 'label': 'sports', 'score': 0.700456440448761}]]
To avoid unpleasant billing surprises we can delete the above resources like so:
# delete the endpoint
aws sagemaker delete-endpoint \
--endpoint-name gliner-api-endpoint
# delete the endpoint config
aws sagemaker delete-endpoint-config \
--endpoint-config-name gliner-api-endpoint-config
# delete the model
aws sagemaker delete-model \
--model-name gliner-api