Skip to main content
Table of contents

Overview of the elements

Set Up API Keys

  • Obtain API keys for Pinecone and OpenAI.
  • Store the keys securely. Consider using a credentials.py file (as mentioned in a previous response).

 

Install Required Libraries

Install the necessary Python libraries, such as Pinecone's Python client (pinecone-client) and OpenAI's Python client (openai).

pip install pinecone-client openai

 

Create a Language Model Chain

Use OpenAI's language model (e.g., GPT-4) to understand and process questions.

Utilise Pinecone for efficient similarity search to find relevant documents based on the processed questions.

 

Process Questions with OpenAI

Use the OpenAI API to process questions and generate relevant information.

import openai
# Set your OpenAI API key
openai.api_key = "your_openai_api_key"
# Example: Generate response from GPT-3
response = openai.Completion.create(
   engine="text-davinci-003",
   prompt="What is the capital of France?",
   max_tokens=100,
)
answer = response.choices[0].text.strip()

 

Index Documents with Pinecone

Index your documents in Pinecone to make them searchable.

from pinecone import Pinecone
# Set your Pinecone API key and environment
pinecone_api_key = "your_pinecone_api_key"
pinecone_api_env = "your_pinecone_api_environment"
# Create a Pinecone client
pinecone = Pinecone(api_key=pinecone_api_key, environment=pinecone_api_env)
# Example: Index a document
document_vector = [0.1, 0.2, 0.3]  # Replace with your actual document vector
document_id = "doc_1"  # Replace with your actual document ID
pinecone.index(index_name="your_index_name", data=[(document_id, document_vector)])

 

Search for Relevant Documents with Pinecone

Use Pinecone to find relevant documents based on the processed questions.

# Example: Search for relevant documents
query_vector = [0.2, 0.1, 0.4]  # Replace with your actual query vector
response = pinecone.query(index_name="your_index_name", query_vector=query_vector, top_k=5)
relevant_documents = response.data  # Extract relevant documents based on similarity

 

Combine Results and Provide Answers

Combine the results from OpenAI and Pinecone to generate final answers.

# Combine OpenAI and Pinecone results
final_answer = f"Question: {question}\nAnswer (from OpenAI): {answer}\nRelevant Documents: {relevant_documents}"
print(final_answer)

 

 

Testing responses for LLM question / answer chain

Common settings

temperature = 0
model = "gpt-4"

 

Question - "Harvest sources was there representative samples for Torres Strait?"

 

Responses

Test 1 - map rerank

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_rerank")
Response

This document does not answer the question

 

Test 2 - map reduce

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_reduce")
Response

The text does not provide information on whether there were representative samples for Torres Strait.

 

Test 3 - refine

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="refine")
Response

The text provides additional information about the difficulties encountered in organizing sample collection in Queensland. Despite positive conversations with several industry members and the dispatch of sample kits, further Queensland sample collection was not undertaken due to the white spot issue, movement restrictions, and workload for industry and regulatory stakeholders. In total, 273 samples were collected from 20 of a possible 31 sources. However, the text still does not specifically mention if any of these samples were from Torres Strait.

 

Test 4 - stuff

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="stuff")
Response

The text does not provide information on whether there were representative samples for Torres Strait.

 

Chain type set to refine

temperature = 0
chain = load_qa_chain(llm, chain_type="refine")

The most comprehensive outcome was achieved when the chain_type was configured to 'refine'.  Therefore, what happens when is the temperature is changed to 0.7?

The text still does not specify whether any samples were collected from Torres Strait. It only mentions that there were challenges in collecting samples from Queensland due to factors like the white spot disease outbreak and movement restrictions. In summary, 273 samples were collected from 20 out of a potential 31 sources, but it's not clear if Torres Strait was one of these sources.

 

 

Question - "What was the process for collecting the prawns?"

 

Responses

Test 1 - refine

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="refine")
Response

The process for collecting prawns involved an initial assessment of the Australian Council of Prawn Fisheries (ACPF) and Australian Prawn Farmers Association (APFA) harvest and production areas. This led to the identification of 35 wild harvest sources and 19 farmed prawn sources. Each source was defined as a specific harvest area and a single species harvested or farmed from that area. If more than one species was harvested, a single harvest area could be defined as several sources. For instance, Shark Bay was one harvest area but was considered three or potentially four sources due to the harvesting of Western Kings, Brown Tigers, Endeavours, and Coral prawns from that area.

A sampling and chain of custody protocol was developed based on forensic sample collection and analysis. Different collection methods and personnel were used to collect the prawn samples. Between 2016 and 2019, 273 wild harvest and 136 farmed prawn samples were collected from 21 wild harvest sources and 17 farmed prawn sources respectively. These samples were then forwarded to the Seafood CRC Company (SCI) for analysis.

 

Test 2 - map reduce

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_reduce")
Response

The process for collecting the prawns was based on a sampling and chain of custody protocol and documentation developed for prawn collection. This was based on forensic sample collection and analysis. A variety of different collection methods and personnel were used.

 

Test 3 - map rerank

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_rerank")
Response

A sampling and chain of custody protocol and documentation for prawn collection was developed, based on forensic sample collection and analysis. Different collection methods and personnel were used, and between 2016 and 2019, 273 wild harvest and 136 farmed prawn samples were collected from 21 wild harvest sources and 17 farmed prawn sources respectively and forwarded to SCI for analysis.

 

Test 4 - stuff

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="stuff")
Response

A sampling and chain of custody protocol was developed for prawn collection, based on forensic sample collection and analysis. Different collection methods and personnel were used to collect the prawns. The prawns were then forwarded to SCI for analysis.

 

 

Related articles

Andrew Fletcher17 Feb 2024
Drupal - Solr working through tm_X3b_en_body error
Having updated Solr, re-indexing wasn't working. The error in the logs wasDrupal\search_api_solr\SearchApiSolrException while indexing item entity:node/2386:en: Solr endpoint http://127.0.0.1:8983/ bad request (code: 400, body: Exception writing document id...