Overview of the elements
Set Up API Keys
- Obtain API keys for Pinecone and OpenAI.
- Store the keys securely. Consider using a credentials.py file (as mentioned in a previous response).
Install Required Libraries
Install the necessary Python libraries, such as Pinecone's Python client (pinecone-client) and OpenAI's Python client (openai).
pip install pinecone-client openai
Create a Language Model Chain
Use OpenAI's language model (e.g., GPT-4) to understand and process questions.
Utilise Pinecone for efficient similarity search to find relevant documents based on the processed questions.
Process Questions with OpenAI
Use the OpenAI API to process questions and generate relevant information.
import openai
# Set your OpenAI API key
openai.api_key = "your_openai_api_key"
# Example: Generate response from GPT-3
response = openai.Completion.create(
engine="text-davinci-003",
prompt="What is the capital of France?",
max_tokens=100,
)
answer = response.choices[0].text.strip()
Index Documents with Pinecone
Index your documents in Pinecone to make them searchable.
from pinecone import Pinecone
# Set your Pinecone API key and environment
pinecone_api_key = "your_pinecone_api_key"
pinecone_api_env = "your_pinecone_api_environment"
# Create a Pinecone client
pinecone = Pinecone(api_key=pinecone_api_key, environment=pinecone_api_env)
# Example: Index a document
document_vector = [0.1, 0.2, 0.3] # Replace with your actual document vector
document_id = "doc_1" # Replace with your actual document ID
pinecone.index(index_name="your_index_name", data=[(document_id, document_vector)])
Search for Relevant Documents with Pinecone
Use Pinecone to find relevant documents based on the processed questions.
# Example: Search for relevant documents
query_vector = [0.2, 0.1, 0.4] # Replace with your actual query vector
response = pinecone.query(index_name="your_index_name", query_vector=query_vector, top_k=5)
relevant_documents = response.data # Extract relevant documents based on similarity
Combine Results and Provide Answers
Combine the results from OpenAI and Pinecone to generate final answers.
# Combine OpenAI and Pinecone results
final_answer = f"Question: {question}\nAnswer (from OpenAI): {answer}\nRelevant Documents: {relevant_documents}"
print(final_answer)
Testing responses for LLM question / answer chain
Common settings
temperature = 0
model = "gpt-4"
Question - "Harvest sources was there representative samples for Torres Strait?"
Responses
Test 1 - map rerank
llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_rerank")
Response
This document does not answer the question
Test 2 - map reduce
llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_reduce")
Response
The text does not provide information on whether there were representative samples for Torres Strait.
Test 3 - refine
llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="refine")
Response
The text provides additional information about the difficulties encountered in organizing sample collection in Queensland. Despite positive conversations with several industry members and the dispatch of sample kits, further Queensland sample collection was not undertaken due to the white spot issue, movement restrictions, and workload for industry and regulatory stakeholders. In total, 273 samples were collected from 20 of a possible 31 sources. However, the text still does not specifically mention if any of these samples were from Torres Strait.
Test 4 - stuff
llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="stuff")
Response
The text does not provide information on whether there were representative samples for Torres Strait.
Chain type set to refine
temperature = 0
chain = load_qa_chain(llm, chain_type="refine")
The most comprehensive outcome was achieved when the chain_type was configured to 'refine'. Therefore, what happens when is the temperature is changed to 0.7?
The text still does not specify whether any samples were collected from Torres Strait. It only mentions that there were challenges in collecting samples from Queensland due to factors like the white spot disease outbreak and movement restrictions. In summary, 273 samples were collected from 20 out of a potential 31 sources, but it's not clear if Torres Strait was one of these sources.
Question - "What was the process for collecting the prawns?"
Responses
Test 1 - refine
llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="refine")
Response
The process for collecting prawns involved an initial assessment of the Australian Council of Prawn Fisheries (ACPF) and Australian Prawn Farmers Association (APFA) harvest and production areas. This led to the identification of 35 wild harvest sources and 19 farmed prawn sources. Each source was defined as a specific harvest area and a single species harvested or farmed from that area. If more than one species was harvested, a single harvest area could be defined as several sources. For instance, Shark Bay was one harvest area but was considered three or potentially four sources due to the harvesting of Western Kings, Brown Tigers, Endeavours, and Coral prawns from that area.
A sampling and chain of custody protocol was developed based on forensic sample collection and analysis. Different collection methods and personnel were used to collect the prawn samples. Between 2016 and 2019, 273 wild harvest and 136 farmed prawn samples were collected from 21 wild harvest sources and 17 farmed prawn sources respectively. These samples were then forwarded to the Seafood CRC Company (SCI) for analysis.
Test 2 - map reduce
llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_reduce")
Response
The process for collecting the prawns was based on a sampling and chain of custody protocol and documentation developed for prawn collection. This was based on forensic sample collection and analysis. A variety of different collection methods and personnel were used.
Test 3 - map rerank
llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_rerank")
Response
A sampling and chain of custody protocol and documentation for prawn collection was developed, based on forensic sample collection and analysis. Different collection methods and personnel were used, and between 2016 and 2019, 273 wild harvest and 136 farmed prawn samples were collected from 21 wild harvest sources and 17 farmed prawn sources respectively and forwarded to SCI for analysis.
Test 4 - stuff
llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="stuff")
Response
A sampling and chain of custody protocol was developed for prawn collection, based on forensic sample collection and analysis. Different collection methods and personnel were used to collect the prawns. The prawns were then forwarded to SCI for analysis.