close
close

first Drop

Com TW NOw News 2024

Use Gemini-1.5-Pro-Latest for smarter eating
news

Use Gemini-1.5-Pro-Latest for smarter eating

Learn how to use Google’s latest Germini-1.5-pro model to develop a generative AI calorie counting app

Use Gemini-1.5-Pro-Latest for smarter eatingPhoto by Pickled Stardust on Unsplash

Have you ever wondered how many calories you consume when you eat, say, dinner? I do it all the time. Wouldn’t it be great if you could just send a photo of your plate through an app and get an estimate of the total calories before you decide how far you want to dig in?

This calorie counter app I made can help you with that. It is a Python application that uses Google’s Gemini-1.5-Pro-Latest model to estimate the number of calories in food.

The app accepts two inputs: a question about the food and a picture of the food or items, or just a plate of food. It gives an answer to the question, the total number of calories in the picture, and a breakdown of calories per item in the picture.

In this article, I’ll explain the entire end-to-end process of building the app from scratch, using Google’s Gemini-1.5-pro-latest (a Large Language generative AI model released by Google). I’ll also explain how I developed the front-end of the application using Streamlit.

It is worth noting here that with the advancement in the world of AI, it is the duty of data scientists to gradually move from traditional deep learning to generative AI techniques to revolutionize their role. This is my main goal to teach about this topic.

I’ll start with a quick explanation of Gemini-1.5-pro-latest and the Streamlit framework. These are the main components in the infrastructure of this calorie counter app.

Gemini-1.5-pro-latest

Gemini-1.5-pro-latest is an advanced AI language model developed by Google. Being the latest version, it has improved capabilities over previous versions in terms of faster response times and improved accuracy when used in natural language processing and building applications.

This is a multimodal model that works with both text and images. It is an improvement over the Google Gemini pro model, which only works with text prompts.

The model works by understanding and generating text, just like humans, based on prompts given to it. In this article, this model is used to generate text for our calorie counter app.

Gemini-1.5-pro-latest can be integrated into other applications to enhance their AI capabilities. In this current application, the model uses generative AI techniques to split the uploaded image into individual food items. Based on the contextual understanding of the food items from the food database, it uses image recognition and object detection to estimate the number of calories, and then adds up the calories for all items in the image.

Power illuminated

Streamlit is an open-source Python framework that manages the user interface. This framework simplifies web development so that you don’t have to write HTML and CSS codes for the front-end throughout the project.

Let’s start building the app.

Building your calorie counter application

I will show you how to build the app in 5 clear steps.

1. Set up your folder structure

To get started, go to your favorite code editor (mine is VS Code) and start a project file. Name it something like Calories-Counter. This is the current working directory. Create a virtual environment (venv), activate it in your terminal, then create the following files: .env, calories.py, requirements.txt.

Here’s a recommendation for the appearance of your folder structure:

Calories-Counter/
├── venv/
│ ├── xxx
│ ├── xxx
├── .env
├── calories.py
└── requirements.txt

Please note that Gemini-1.5-Pro ​​works best with Python versions 3.9 and later.

2. Get the Google API key

Like other Gemini models, Gemini-1.5-pro-latest is currently free for public use. To access it, you will need to obtain an API key, which you can get from Google AI Studio by going to “Get API Key” in this link . Once the key is generated, copy it into your code for later use. Save this key as an environment variable in the .env file as follows.

GOOGLE_API_KEY="paste the generated key here"

3. Install dependencies

Type the following libraries into your requirements.txt file.

  • streamlined
  • google-generation-veai
  • python-dotenv

Install the libraries in requirements.txt in the terminal with:

python -m pip install -r requirements.txt

4. Write the Python script

Now let’s start writing the Python script in calories.py. Import all the required libraries with the following code:

# import the libraries
from dotenv import load_dotenv
import streamlit as st
import os
import google.generativeai as genai
from PIL import Image

Here you can see how the different imported modules are used:

  • dotenv — Since this application will be configured from a Google API key environment variable, dotenv is used to load the configuration from the .env file.
  • Streamlit — to create an interactive user interface for the front-end
  • The os module is used to manage the current working directory while performing file operations, such as retrieving the API key from the .env file
  • The google.generativeai module of course gives us access to the Gemini model that we are going to use.
  • PIL is a Python image library used for managing image file formats.

The following lines configure and load the API keys from the environment variables store.

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

load_dotenv()

Define a function that, when called, loads the Gemini-1.5-pro-latest and receives the response, as follows:

def get_gemini_reponse(input_prompt,image,user_prompt):
model=genai.GenerativeModel('gemini-1.5-pro-latest')
response=model.generate_content((input_prompt,image(0),user_prompt))
return response.text

In the above function you can see that it takes as input the input prompt specified later in the script, an image supplied by the user, and a user prompt/question supplied by the user. All of that goes into the Gemini model to return the response body.

Since Gemini-1.5-pro expects input images in the form of byte arrays, the next step is to write a function that processes the uploaded image and converts it to bytes.

def input_image_setup(uploaded_file):
# Check if a file has been uploaded
if uploaded_file is not None:
# Read the file into bytes
bytes_data = uploaded_file.getvalue()

image_parts = (
{
"mime_type": uploaded_file.type, # Get the mime type of the uploaded file
"data": bytes_data
}
)
return image_parts
else:
raise FileNotFoundError("No file uploaded")

Next, provide the input prompt that defines the behavior of your app. Here, we simply tell Gemini what to do with the text and image that the user feeds to the app.

input_prompt="""
You are an expert nutritionist.
You should answer the question entered by the user in the input based on the uploaded image you see.
You should also look at the food items found in the uploaded image and calculate the total calories.
Also, provide the details of every food item with calories intake in the format below:

1. Item 1 - no of calories
2. Item 2 - no of calories
----
----

"""

The next step is to initialize Streamlit and create a simple user interface for your calorie counter app.

st.set_page_config(page_title="Gemini Calorie Counter App")
st.header("Calorie Counter App")
input=st.text_input("Ask any question related to your food: ",key="input")
uploaded_file = st.file_uploader("Upload an image of your food", type=("jpg", "jpeg", "png"))
image=""
if uploaded_file is not None:
image = Image.open(uploaded_file)
st.image(image, caption="Uploaded Image.", use_column_width=True) #show the image

submit=st.button("Submit & Process") #creates a "Submit & Process" button

The above steps contain all the components of the app. At this point, the user can open the app, enter a question, and upload an image.

Finally, let’s put all the parts together so that once the user clicks the ‘Send & Process’ button, he gets the required response text.

# Once submit&Process button is clicked
if submit:
image_data=input_image_setup(uploaded_file)
response=get_gemini_reponse(input_prompt,image_data,input)
st.subheader("The Response is")
st.write(response)

5. Run the script and interact with your app

Now that the app development is complete, you can run it in the terminal using the following command:

streamlit run calories.py

To interact with your app and see how it is performing, view your Streamlit app in your browser using the generated local or network URL.

Demo images of our Calorie Counter app in a Chrome browser

This is what your Streamlit app looks like when you first open it in the browser.

Demo image of the first view of the Calorie Counter app: photo by author.

Once the user asks a question and uploads an image, it looks like this:

Demo image of Calorie Counter app with user input question and user uploaded image: Photo by author. The image of the food loaded in the app: Photo by Odiseo Castrejon on Unsplash

Once the user presses the “Send & Process” button, the response shown in the image below will be generated at the bottom of the screen.

Demo image of Calorie Counter app with the generated answer: Photo by author

The next steps

For remote access, you can consider deploying your app using cloud services such as AWS, Heroku, Streamlit Community Cloud. In this case, we are using Streamlit Community Cloud to deploy the app for free.

Click ‘Deploy’ at the top right of the app screen and follow the prompts to complete the deployment.

After deployment, you can share the generated app URL with other users.

Limitations of the Calorie Counter App

As with other AI applications, the results generated are the best estimates of the model. So before you fully rely on the app, you should consider the following potential risks:

  • The calorie counter app may misclassify certain foods and therefore display the wrong number of calories.
  • The app has no reference point to estimate the size of the food — portion — based on the uploaded image. This can lead to errors.
  • Relying too much on the app can lead to stress and mental health issues, as one can become obsessed with counting calories and worry about results that may not be accurate.

Reduce the app’s limitations

To reduce the risks associated with using the calorie counter, here are some possible improvements that could be incorporated into its development:

  • Add contextual analysis of the image, which helps to measure the size of the analyzed food portion. For example, the app can be built in such a way that a standard object such as a spoon, included in the food image, can be used as a reference point for measuring the size of the food items. This will reduce errors in the resulting total calories.
  • Google could improve the diversity of foods in their training set to reduce misclassification errors. They could expand it to include foods from more cultures, so that even rare African foods are identified.


Using Gemini-1.5-Pro-Latest for Smarter Eating was originally published in Towards Data Science on Medium. People continued the conversation by bookmarking and commenting on this story.