To use Pinecone, OpenAI and a language model chain to answer questions in documents

Andrew Fletcher published: 21 November 2023 (updated) 22 November 2023 5 minutes read

10.x

Overview of the elements

Set Up API Keys

Obtain API keys for Pinecone and OpenAI.
Store the keys securely. Consider using a credentials.py file (as mentioned in a previous response).

Install Required Libraries

Install the necessary Python libraries, such as Pinecone's Python client (pinecone-client) and OpenAI's Python client (openai).

pip install pinecone-client openai

Create a Language Model Chain

Use OpenAI's language model (e.g., GPT-4) to understand and process questions.

Utilise Pinecone for efficient similarity search to find relevant documents based on the processed questions.

Process Questions with OpenAI

Use the OpenAI API to process questions and generate relevant information.

import openai
# Set your OpenAI API key
openai.api_key = "your_openai_api_key"
# Example: Generate response from GPT-3
response = openai.Completion.create(
   engine="text-davinci-003",
   prompt="What is the capital of France?",
   max_tokens=100,
)
answer = response.choices[0].text.strip()

Index Documents with Pinecone

Index your documents in Pinecone to make them searchable.

from pinecone import Pinecone
# Set your Pinecone API key and environment
pinecone_api_key = "your_pinecone_api_key"
pinecone_api_env = "your_pinecone_api_environment"
# Create a Pinecone client
pinecone = Pinecone(api_key=pinecone_api_key, environment=pinecone_api_env)
# Example: Index a document
document_vector = [0.1, 0.2, 0.3]  # Replace with your actual document vector
document_id = "doc_1"  # Replace with your actual document ID
pinecone.index(index_name="your_index_name", data=[(document_id, document_vector)])

Search for Relevant Documents with Pinecone

Use Pinecone to find relevant documents based on the processed questions.

# Example: Search for relevant documents
query_vector = [0.2, 0.1, 0.4]  # Replace with your actual query vector
response = pinecone.query(index_name="your_index_name", query_vector=query_vector, top_k=5)
relevant_documents = response.data  # Extract relevant documents based on similarity

Combine Results and Provide Answers

Combine the results from OpenAI and Pinecone to generate final answers.

# Combine OpenAI and Pinecone results
final_answer = f"Question: {question}\nAnswer (from OpenAI): {answer}\nRelevant Documents: {relevant_documents}"
print(final_answer)

Testing responses for LLM question / answer chain

Common settings

temperature = 0
model = "gpt-4"

Question - "Harvest sources was there representative samples for Torres Strait?"

Responses

Test 1 - map rerank

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_rerank")

Response

This document does not answer the question

Test 2 - map reduce

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_reduce")

Response

The text does not provide information on whether there were representative samples for Torres Strait.

Test 3 - refine

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="refine")

Response

The text provides additional information about the difficulties encountered in organizing sample collection in Queensland. Despite positive conversations with several industry members and the dispatch of sample kits, further Queensland sample collection was not undertaken due to the white spot issue, movement restrictions, and workload for industry and regulatory stakeholders. In total, 273 samples were collected from 20 of a possible 31 sources. However, the text still does not specifically mention if any of these samples were from Torres Strait.

Test 4 - stuff

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="stuff")

Response

The text does not provide information on whether there were representative samples for Torres Strait.

Chain type set to refine

temperature = 0
chain = load_qa_chain(llm, chain_type="refine")

The most comprehensive outcome was achieved when the chain_type was configured to 'refine'. Therefore, what happens when is the temperature is changed to 0.7?

The text still does not specify whether any samples were collected from Torres Strait. It only mentions that there were challenges in collecting samples from Queensland due to factors like the white spot disease outbreak and movement restrictions. In summary, 273 samples were collected from 20 out of a potential 31 sources, but it's not clear if Torres Strait was one of these sources.

Question - "What was the process for collecting the prawns?"

Responses

Test 1 - refine

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="refine")

Response

The process for collecting prawns involved an initial assessment of the Australian Council of Prawn Fisheries (ACPF) and Australian Prawn Farmers Association (APFA) harvest and production areas. This led to the identification of 35 wild harvest sources and 19 farmed prawn sources. Each source was defined as a specific harvest area and a single species harvested or farmed from that area. If more than one species was harvested, a single harvest area could be defined as several sources. For instance, Shark Bay was one harvest area but was considered three or potentially four sources due to the harvesting of Western Kings, Brown Tigers, Endeavours, and Coral prawns from that area.

A sampling and chain of custody protocol was developed based on forensic sample collection and analysis. Different collection methods and personnel were used to collect the prawn samples. Between 2016 and 2019, 273 wild harvest and 136 farmed prawn samples were collected from 21 wild harvest sources and 17 farmed prawn sources respectively. These samples were then forwarded to the Seafood CRC Company (SCI) for analysis.

Test 2 - map reduce

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_reduce")

Response

The process for collecting the prawns was based on a sampling and chain of custody protocol and documentation developed for prawn collection. This was based on forensic sample collection and analysis. A variety of different collection methods and personnel were used.

Test 3 - map rerank

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_rerank")

Response

A sampling and chain of custody protocol and documentation for prawn collection was developed, based on forensic sample collection and analysis. Different collection methods and personnel were used, and between 2016 and 2019, 273 wild harvest and 136 farmed prawn samples were collected from 21 wild harvest sources and 17 farmed prawn sources respectively and forwarded to SCI for analysis.

Test 4 - stuff

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="stuff")

Response

A sampling and chain of custody protocol was developed for prawn collection, based on forensic sample collection and analysis. Different collection methods and personnel were used to collect the prawns. The prawns were then forwarded to SCI for analysis.

Andrew Fletcher • 04 Apr 2025

Managing .gitignore changes

Drupal

When working with Git, the .gitignore file plays a critical role in controlling which files and folders are tracked by version control. Yet, many developers are unsure when changes to .gitignore take effect and how to manage files that are already being tracked. This uncertainty can lead to...

Andrew Fletcher • 26 Mar 2025

How to fix the ‘Undefined function t’ error in Drupal 10 or 11 code

Drupal

Upgrading to Drupal 10.4+ you might have noticed a warning in their code editor stating “Undefined function ‘t’”. While Drupal’s `t()` function remains valid in procedural code, some language analysis tools — such as Intelephense — do not automatically recognise Drupal’s global functions. This...

Andrew Fletcher • 17 Mar 2025

Upgrading to PHP 8.4 challenges with Drupal contrib modules

Drupal

The upgrade from PHP 8.3.14 to PHP 8.4.4 presents challenges for Drupal 10.4 websites, particularly when dealing with contributed modules. While Drupal core operates seamlessly, various contrib modules have not yet been updated to accommodate changes introduced in PHP 8.4.x. This has resulted in...

Overview of the elements

Set Up API Keys

Install Required Libraries

Create a Language Model Chain

Process Questions with OpenAI

Index Documents with Pinecone

Search for Relevant Documents with Pinecone

Combine Results and Provide Answers

Testing responses for LLM question / answer chain

Common settings

Question - "Harvest sources was there representative samples for Torres Strait?"

Responses

Test 1 - map rerank

Response

Test 2 - map reduce

Response

Test 3 - refine

Response

Test 4 - stuff

Response

Chain type set to refine

Question - "What was the process for collecting the prawns?"

Responses

Test 1 - refine

Response

Test 2 - map reduce

Response

Test 3 - map rerank

Response

Test 4 - stuff

Response

Related articles