We built our client a custom ChatGPT bot that they could train to answer questions based on files in a Google Drive folder.
That way the client could add files to the folder and automatically update the bot's knowledge base in the future.
We promised the client this could be built in two weeks. We delivered it in just 1 week.
- Client is a nutritionist who spent a significant amount of time each day answering patient intake questions
- They wanted to create a Chat-Bot to deflect repetitive questions
- They had a folder with over 50 documents of various file types that had the answers to the most common questions patients would ask
- They wanted this Chat-Bot to use the files they had as context to answer the patient intake questions
- They were in the process of rebuilding their website using Bubble.io, a no-code UI solution
- They needed help building a Chat-Bot that could integrate with their Bubble site that patients could ask questions to and get answers from using the context from their folder of files
- The overall tech stack we used was Bubble.io (front-end) + Google Drive (for storing context) + Python Flask Server (for connecting to the Bubble frontend and talking to OpenAI's GPT-3.5 API endpoint) + Heroku for deployments
- We built a demo Bubble Project that had a form that you could input questions into
- We built a Python Flask server with two main API's: query() and index()
- Index() would scan through the files in the client's Google Drive folder and create an "index" using llama-index that ChatGPT could query later
- Query() would take in the patient question and use the context stored in the "index" to answer it
- Then we connected the Bubble UI to the API's from this Flask server to ask it questions
Zapier vs. Custom code
The client initially misunderstood the scope of the work, as they thought you could connect a folder with their files to ChatGPT using something similar to Zapier.
This is actually not possible, because the text in their files were around 200k tokens, but the token limit for ChatGPT was 4k at the time of the engagement (as of writing on 6/12/23, OpenAI released a new version supporting 16k token contexts). This means we have to index their files before querying, which is beyond what Zapier is capable of doing.
Although a solution involving Zapier could exist, the "hacky" solution would take longer than properly implementing it with custom code, and so we went for the custom code solution instead.
Had their context been less than 4k tokens, then we could skip the indexing and call Open AI's API's directly by feeding it the context. In this case, a solution with Zapier is more feasible.
Integration with Google Drive
We integrated directly with Google Drive's API because the client was non-technical and we did not want to change their existing workflow.
Integration was simple, however it required us to create a "service account" so that our server could programmatically access their Google Drive folder while setting the private keys as an environment variable for their Heroku server.
Automating Indexing Out
There was a complication regarding updates to the Google Drive folder, as changes would not be automatically picked up. Each time the client added a file, the folder would need to be "reindexed". However, we had to be careful about when we "reindexed" the folder as that costs money. So we only want to reindex it when there is actually a change.
We presented the client with three options:
1) We could set up a Zap between the Google Drive API such that whenever there was a change, it could trigger a Zap that called the "Index()" API to reindex the folder.
2) We could setup a cron job that periodically scanned the Google Drive for changes, and if a change was detected, call the index() API.
3) They could also just remember to manually call the "index()" API by navigating to a URL.
Since the client was budget conscious and did not expect many changes to the folder, they decided that remembering that extra step of reindexing every time they added something to the folder wasn't a huge issue and so we went for option 3).
This was a reminder that not everything has to be automated sometimes, and that it's worth weighing how frequent a problem is as well as the budgetary constraints to make these decisions.
We also discussed with the client how there are 3 main costs you have to consider when building a Chatbot like this out:
1. Open API API costs- every time you send a query or index, it calls OpenAI's specific endpoints, priced by $ per token
2. Cloud hosting costs - Flask backend on Heroku (the client's traffic was low-enough we could stay on the free tier)
3. Document and context storing costs - Free in this case as the context was relatively small, but it can be expensive if clients have a lot of data, and choose to save them on S3 and corresponding embeddings in vectorDB's like Pinecone.
Need your own ChatGPT bot built?
We will build your Chatbot in 2 months. Reach out to us at allinengineeringconsulting.com.