Skip to main content

It seems like you're using the CharacterTextSplitter class from the tiktoken library to split text into chunks. The CharacterTextSplitter.from_tiktoken_encoder() method is used to create an instance of the CharacterTextSplitter class with specific configuration settings.

 

Breakdown of the parameters used in this method

separator

This parameter specifies the character or sequence of characters used to separate the text into chunks. In your example, the separator is set to "\n", which means the text will be split at newline characters. If you want to split the text differently, you can change this separator to another character or sequence of characters.

chunk_size

This parameter sets the maximum size of each chunk in terms of the number of characters. In your example, the chunk size is set to 1000 characters.  This means that the text will be split into chunks, each containing up to 1000 characters.

chunk_overlap

This parameter specifies the number of characters that will overlap between adjacent chunks. In your example, the overlap is set to 100 characters.  This means that the last 100 characters of one chunk will overlap with the first 100 characters of the next chunk.  After creating an instance of CharacterTextSplitter with these settings, you can use it to split your text into chunks. Here's how you can use it:

from tiktoken import CharacterTextSplitter
text = "Your text goes here..."
splitter = CharacterTextSplitter.from_tiktoken_encoder(separator="\n", chunk_size=1000, chunk_overlap=100)
chunks = splitter.split(text)
for chunk in chunks:
   print(chunk)

In this example, replace "Your text goes here..." with your actual text, and the code will split the text into chunks based on the provided settings and print each chunk.

Related articles

Andrew Fletcher13 Feb 2025
Deploying a Python project from UAT to production using Git
When deploying a Python project from a User Acceptance Testing (UAT) environment to Production, it’s essential to ensure that all dependencies and configurations remain consistent. Particularly in our situation where this was going to be the first deployment of AI semantic search functionality to...
Andrew Fletcher19 Nov 2024
How to resolve issues with Python and virtual environments in pyenv
For developers working with Python, setting up and managing environments can sometimes lead to frustrating terminal errors. If you’ve encountered issues like the `python: command not found` error or struggled to create a virtual environment, this guide walks through resolving these common problems...