Brewing a Coffee Recommender (Part 2)
This is part two of a two part series. Part one focuses on NLP and building a recommender. This article focuses on the deployment of my coffee recommender on Streamlit.
In part one of this mini-series, I walked through some of the process for scraping, preprocessing, and performing topic modeling on coffee reviews to build a recommendation system. Picking up from where I left off, I wanted deploy my model, along with a few other tools, to an online app that others could access. While there are many options for this process (Flask, Heroku, etc.), I will focus on using Streamlit for the entire process, you can find the app running here.
Between an easy to use syntax that I can set up in a Jupyter Notebook or other text editor to an active online community, Streamlit offers a variety of appealing advantages. In just a few minutes, I can create a simple program, run it locally, and see how my app will look in a local browser window. Streamlit also allows for the option to have them host your app, so it can be deployed off your local machine, using their Streamlit Sharing tool. Using Sharing, Streamlit connects with your Github and reads from specified paths to run your .py app, read any files or pickles you call for, and create an easily modified web browser user interface.
To begin, Streamlit offers a walk through of set up and some initial functions you may want to know the syntax for. But first, make sure you’ve installed Streamlit to your local machine. It really is as simple as two lines from your terminal to get started:
pip install streamlit
These two lines will install Streamlit in your current directory and run the hello app. From there, you should see a sample app open in your browser window. All that is needed to the same for your own up will be to create your .py file(e.g. my_app.py) and instead of running hello:
streamlit run my_app.py
If you want to print text on your app screen, there are three main options:
- Use the st.write() command the same way you might use print() in Python
- Use the st.title() command similarly, for a title
- Use magic commands and Streamlit will apply the .write() for you.
''' Place text here '''
df = pd.DataFrame(...)
Similar functionality exists for plotting graphs, maps, creating checkboxes, and more. I’d also recommend keeping this handy cheat sheet available while writing the code. For my coffee recommendation app, the Streamlit Community pages were an invaluable resource for useful tips and tools. This included creating a radio button for adding multiple pages to my app, adding Plotly graphs, and creating a sidebar for navigation. I carefully collected pickles of all my models and necessary dataframes to be loaded at the beginning of the program. This included the TF-IDF vectorizer, NMF model, Coffee dataframe full of text and scores, linear regression and random forest models for predicting scores, and more. Basically, I wanted the models needed to both make use of my past work and transform any user input as well.
The Streamlit Community pages were an invaluable resource for useful tips and tools
For the remainder of the article, I want to focus on a couple items that were crucial steps to deployment that might be easily overlooked. Once accepted for Streamlit sharing and you’ve allowed Streamlit access to your Github, your Github will function as your directory for all the files in your app.
The file structure that seemed best was to have a folder within my repository for the Streamlit app. If you click the link, you’d see my full repository and a folder in it titled “web_app.” In the web_app folder, I placed all my pickles and my .py file. This is the folder I told Streamlit to look for to compile the app.
The file path must have /app/ at the beginning
What took me a little while to figure out was that in the .py file, when I am loading a file (such as a .pickle), the file path must have /app/ at the beginning. So, for example, if I wanted to load the file nmf_tfidfblind.pickle (my NMF model based on TFIDF of blind reviews) in my web_app folder, it would have the path:
The other key item to notice in the path above is that my Github Repository is name “Coffee-Review-NLP” but in the path it is all lower case. This took a bit of trial and error, but making everything lower case solved my problems of Streamlit not finding the needed files.
Additionally, you’ll need to let Streamlit know what packages and versions of them you’ve used in creating your app. These will all be stored in a requirements.txt file, which is just a list of packages and their versions. To generate a requirements file, I ran the following from the directory in which your .py file is stored:
pip install pipreqs
Additionally, to confirm the versions are correct, I ran lines in my Jupyter Notebook, after importing the packages, of
pandas.__version__ to confirm that I had my versions listed correctly in the requirements.txt file. I only ended up doing that after running into compiling errors and realized one of mine had the wrong version listed. This requirements.txt file should be saved not in your web_app folder, but in the initial repository.
Finally, log in to Streamlit Sharing, point Streamlit in the direction of your repo and folder, and hit deploy! A new tab will appear in which your app will “cook” and compile. If there are any errors, you’ll be alerted to that and you can go back to modify your .py code as needed. Once deployed, all you need to do is update the code, Git commit, and push it to Github. Streamlit will rerun on its own with your new code!
Please feel free to reach out with any questions about my articles here or about Streamlit in general. Here is a link to my app, for your consideration. Thanks for reading!