How to Build Your Own GPT-J Playground How to Build Your Own GPT-J Playground
When OpenAI released a playground for its GPT-3 model, the community was quick to create all sorts of impressive demos, many of which can... How to Build Your Own GPT-J Playground

When OpenAI released a playground for its GPT-3 model, the community was quick to create all sorts of impressive demos, many of which can be found in the Awesome GPT-3 Github repo.

But what if we wanted to create our very own text generation playground? GPT-3 is proprietary and using the API to generate text would cost us. It would also mean that we would have to send our data to OpenAI. Ideally, what we would like to do instead is host an open-source text generation model and a playground app in an environment we control.

Well, it looks like we’re in luck. In 2021, Eleuther AI created GPT-J, an open source text generation model to rival GPT-3. And, of course, the model is available on the Hugging Face (HF) Model Hub, which means we can leverage the HF integration in Amazon SageMaker to easily deploy the model. And to create a web interface to interact with the model we can use Streamlit, which allows us to write web apps by just using Python 🙂

Let’s get started then! The entire code for the tutorial can be found in this Github repo.

Deploying the GPT-J model

The folks at Hugging Face have made it super easy, barely an inconvenience, to deploy a 6B parameters model like GPT-J on Amazon SageMaker (SM). Philipp Schmid from the HF team outlines the process in his blog post — all that is required is 24 lines of code:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

# IAM role with permissions to create endpoint
role = sagemaker.get_execution_role()

# public S3 URI to gpt-j artifact

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.g4dn.xlarge', #'ml.p3.2xlarge' # ec2 instance type
Code to deploy GPT-J on SageMaker (Credit: https://www.philschmid.de/deploy-gptj-sagemaker)

Once we run this code in a SM notebook it will take a few minutes for the endpoint to be deployed. The SM console will tell us when the endpoint is ready:

Image by author

Creating a web interface

Setting up EC2

Once the model is deployed on a SageMaker endpoint we can run inference requests right there in our notebook. But why stop there? With the power of Streamlit we can easily create our very own GPT-J playground!

All we need is a server where we can host a Streamlit application and access the SM endpoint. If we choose to do this on an EC2 instance in the same account as the endpoint, we can access the endpoint via the API provided by SM:

Image by author

All we need to do is provision a small EC2 instance, for example a t3.medium, because it will only host the web app. Once the instance is up and running we install miniconda and install the required libraries via pip install:

 # downloading & installing miniconda
 wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
 bash ~/miniconda.sh -b -p ~/miniconda
 echo "PATH=$PATH:$HOME/miniconda/bin" >> ~/.bashrc
 # installing streamlit & boto3 (AWS SDK for Python) libraries
 pip3 install streamlit
 pip3 install boto3

To connect to our endpoint, we also need to pass our credentials from EC2 to SageMaker. There are many ways to do this, for more information check out the AWS documentation. In this tutorial I will use a shared credentials file.

Create Streamlit app

Now we can develop our Streamlit application, which is very straightforward. Let’s first define a method that will take care of pre- and post-processing the model payload and response:

def generate_text(prompt):
    payload = {"inputs": prompt}

    response = sagemaker_runtime.invoke_endpoint(
    result = json.loads(response['Body'].read().decode())
    text = result[0]['generated_text']
    return text

Now we need a text field in which we can enter our prompt, call our generate_text() method, and display the model response:

st.header("My very own GPT-J Playground")
prompt = st.text_area("Enter your prompt here:")

if st.button("Run"):
    generated_text = generate_text(prompt)

And that is it! Amazingly easy, isn’t it?

Testing the app

Let’s test this setup! We can start running the streamlit app with the streamlit run <python-script> command. This will start running the app on port 8501, so we need to make sure that port 8501 is not blocked (on an EC2 instance this means that the corresponding security group needs to be updated to let traffic to port 8501 pass through). Once the app is up and running we can access it via http://<EC2-IP-ADDRESS>:8501.

We will see a textbox where we can enter our prompt and a Run button that will call our GPT-J model. Let’s enter a prompt into the textbox and run the model. After a few seconds it should return with generated text:

Image by author

And that’s it — we have just generated our first text with a GPT-J model in our own playground app!


There are many ways on which this playground can be improved, some of which are:

  • The newly generated text could be used as a new prompt for the model. That would allow us to continue generating text on the same topic.
  • Preventing the model from stopping in the middle of a sentence.
  • Introducing parameters to influence the text that is generated, such as Temperature and Top P. You can find out more about these parameters and text generation in general in this blog post.

Article originally posted here by Heiko Hotz. Reposted with permission.

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.