LLM Serving

Working on LLMs often entails us to conduct a demo for real-time test. Sometimes we have to set things up so that co-worker can play with our model to find out the issues there. An eassy way is to use Flask.

import flask

app = flask.Flask(__name__)

@app.route('/')
def index():
 return "<h3>My LLM Playground</h3>"

Start the Server

Start the server, we can run

ApiServicePort=xxxx python3 serve.py

Front-End

If we use flask render_template to provide the front end, then we can use the following to ways to launch the app,

# method 1
flask run

# method 2
python3 app.py

Another way is to use streamlit. Streamlit is an open-source Python library that allows developers to create web applications for data science and machine learning projects with minimal effort. It is designed to simplify the process of turning data scripts into shareable web apps, enabling users to interact with data and models through a web browser. If we use streamlit, we can run with

streamlit run app.py

Usually we first star the serve and specific the port to listening on. Then pull up the front end page.

The page will be like the following, simple and easy!!

gopher dataset
LLM Playground

References

[1] Openplayground