Data Format

The data file contained in ChatterBot Corpus is formatted using YAML syntax. This format is used because it is easily readable by both humans and machines.

Corpus Properties
Property Required Description
categories Required A list of categories that describe the conversations.
conversations Optional A list of conversations. Each conversation is denoted as a list.

Here is an example of the corpus data:

categories:
- english
- greetings
conversations:
- - Hello
  - Hi
- - Hello
  - Hi, how are you?
  - I am doing well.
- - Good day to you sir!
  - Why thank you.
- - Hi, How is it going?
  - It's going good, your self?
  - Mighty fine, thank you.

The values in this example have the following relationships.

Evaluated statement relationships
Statement Response
Hello Hi
Hello Hi, how are you?
Hi, how are you? I am doing well.
Good day to you sir! Why thank you.
Hi, How is it going? It’s going good, your self?
It’s going good, your self? Mighty fine, thank you.