      &langchain_community.docstore.in_memoryInMemoryDocstore)}_dict}($d16ac2f5-faf3-4575-8684-7e4f2c47e047langchain_core.documents.baseDocument)}(__dict__}(page_content# Prompt Compression

After chatting for a few rounds, the chat history can become quite long, especially when we have code and execution results in it.
This can cause the problem of exceeding the context window of the LLMs. metadata}(sourcedocs\compression.mdtitle# Prompt Compressionchunk_idK utypeh	u__fields_set__(hh__private_attribute_values__}
_lc_kwargs}(hhhhusub$26ebf1be-b985-4a84-a253-4d2f622e5852h
)}(h
}(hX  To solve the problem, one way is to summarize the chat history a few rounds ago, 
and only keep the latest rounds of the chat history.

Another way is to use a vector database to store the chat history entries, and only retrieve the last few rounds of the relevanth}(hhhhhKuhh	uh(hhh}h}(hh#hh$usub$928ee46f-69f2-47b6-a039-7039aedaa93ch
)}(h
}(hX,  part given the current user request. However, in TaskWeaver, code is also part of the chat history.
It is not an option to skip some intermediate code and execution results in order to correctly
generate the code for the current user request. Therefore, we choose the first way to solve the problem.
h}(hhhhhKuhh	uh(hhh}h}(hh,hh-usub$a4f77ec2-39df-4f22-b8ea-0ca6a8128c92h
)}(h
}(hThe following figure shows the idea of chat history summarization where the chat history is divided into two parts:
- Rounds to compress: this part is summarized and only the summary is kept in the chat history. If the context_summary h}(hhhhhKuhh	uh(hhh}h}(hh5hh6usub$05731d51-8a9b-4408-b03f-b621cd05e3edh
)}(h
}(hX<      already exists, a new summary is generated based on the previous summary adding the rounds to be summarized.
- Rounds to retain: this part is kept in the chat history without summarization.

```mermaid
  flowchart LR
      ConversationSummary-->Round1
      subgraph Rounds to compress 
          Round1-->Round2h}(hhhhhKuhh	uh(hhh}h}(hh>hh?usub$02996de2-955a-4b35-88cd-e2004e2ca27ah
)}(h
}(h      end
      subgraph Rounds to retain
          Round2-->Round3-->Round4-->Round5    
      end
```
Imagine that, at the beginning, the ConversationSummary is empty.h}(hhhhhKuhh	uh(hhh}h}(hhGhhHusub$8bc3a6cb-c41d-4284-bc18-f8bc9be7a181h
)}(h
}(huOnce the chat history reaches the `rounds_to_compress` (default 2) rounds plus `rounds_to_retain` (default 3) rounds,h}(hhhhhKuhh	uh(hhh}h}(hhPhhQusub$34b81f75-aa42-43a0-9630-bd24a10afca7h
)}(h
}(hthe ConversationSummary is generated based on the `rounds_to_compress` rounds and the `rounds_to_retain` rounds are kept in the chat history.
After that, there will be only `rounds_to_retain` rounds in the chat history.h}(hhhhhKuhh	uh(hhh}h}(hhYhhZusub$a27ca70a-c8b2-4ab7-82f4-48c2bb6b5462h
)}(h
}(hX3  The next time the chat history reaches the `rounds_to_compress` rounds plus `rounds_to_retain` rounds,
the ConversationSummary is generated based on the `rounds_to_compress` rounds and the previous ConversationSummary.
We use these two parameters to control the frequency of the chat history summarization.
h}(hhhhhKuhh	uh(hhh}h}(hhbhhcusub$5882919b-2f9f-4ca2-b79b-5e68e9ae1069h
)}(h
}(h]An example of the chat history summarization in the Code Generator is shown below:

```json
{h}(hhhhhK	uhh	uh(hhh}h}(hhkhhlusub$b5128fb5-2f99-48de-b632-810d0cf50782h
)}(h
}(hX    "ConversationSummary": "The user requested the generation of 100 random numbers, which was successfully executed. Then, the user asked to show the top 5 largest numbers from the generated random numbers. The assistant provided a code snippet to sort the generated random numbers in descending order and select the top 5 largest numbers, which was also successfully executed. After that, the user requested to plot the distribution of the 100 numbers, which was successfully executed. The user then asked to count the frequency of numbers in each bin of the histogram and identify the bin with the most numbers for the 0.1 bin width, which was also successfully executed.",h}(hhhhhK
uhh	uh(hhh}h}(hhthhuusub$1fbe3187-86ec-4147-8f79-597aed9a3d25h
)}(h
}(h  "Variables": [
    {
      "name": "random_numbers_100",
      "type": "numpy array",
      "description": "An array containing 100 random numbers generated using np.random.rand()"
    },
    {
      "name": "top_5_largest",h}(hhhhhKuhh	uh(hhh}h}(hh}hh~usub$57c0c5cd-7215-4e76-b89c-a3562c176167h
)}(h
}(h      "type": "numpy array",
      "description": "An array containing the top 5 largest numbers from the generated random numbers"
    }
  ]
}
```
The JSON object has two fields:
- ConversationSummary: the summary of the chat history.h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$7d7a9c5c-b62e-4e9a-8d86-de32800defb6h
)}(h
}(hX  - Variables: the variables in the chat history that could be used in the current user request.

The chat history summary of the Planner has only the ConversationSummary field. 

The actual code generated in the summarized rounds is ignored and only the variables are kept in the summaryh}(hhhhhK
uhh	uh(hhh}h}(hhhhusub$2f344d53-9c34-4d01-98f5-5ec386b54a59h
)}(h
}(hso that the LLM can still refer the these variables in future code generation.

One thing to note is that chat history summarization requires call the LLM which incurs additional latency and cost.h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$6dfede57-5ca3-4a1c-9697-9a826ba53415h
)}(h
}(hThe prompts for chat history summarization could be found for [planner](../../taskweaver/planner/compression_prompt.yaml)
and [code generator](../../taskweaver/code_interpreter/code_generator/compression_prompt.yaml).

## Configurationsh}(hhhhhKuhh	uh(hhh}h}(hhhhusub$ebc0967b-4459-4161-8097-088b71e85d3eh
)}(h
}(hAs explained above, there are two parameters in controlling the chat history summarization: 
`round_compressor.rounds_to_compress` (default 2) and `round_compressor.rounds_to_retain` (default 3).h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$bbb320ca-2003-492b-bf45-8a308cdb458fh
)}(h
}(hTo enable the chat history summarization, you need to set `planner.prompt_compression` 
and `code_generator.prompt_compression` to `true`.h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$bc06979f-9a5f-4c22-bc20-94d8dcff794bh
)}(h
}(hX  # Configuration File
The configuration file is located at `project/taskweaver_config.json`. 
You can edit this file to configure TaskWeaver.
The configuration file is in JSON format. So for boolean values, use `true` or `false` instead of `True` or `False`. h}(hdocs\configurations.mdh# Configuration FilehK uhh	uh(hhh}h}(hhhhusub$81736cb2-a195-430a-b8cc-135b27be9c74h
)}(h
}(hX  For null values, use `null` instead of `None` or `"null"`. All other values should be strings in double quotes.
The following table lists the parameters in the configuration file:

| Parameter                                     | Description                                                                            | Default Value                                                                          |
|---|---|---|h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$39e81676-bdbf-4841-969a-988dc4f3a17dh
)}(h
}(hX  | `llm.model`                                   | The model name used by the language model.                                             | gpt-4                                                                                  |
| `llm.backup_model`                            | The model name used for self-correction purposes.                                      | `null`                                                                                 |h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$92fc235d-bfbe-4bc9-8f3b-fa8ed976855ch
)}(h
}(hX  | `llm.api_base`                                | The base URL of the OpenAI API.                                                        | `https://api.openai.com/v1`                                                            |
| `llm.api_key`                                 | The API key of the OpenAI API.                                                         | `null`                                                                                 |h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$bcfd0b36-e354-429b-9fdb-e329086ac8a3h
)}(h
}(h| `llm.api_type`                                | The type of the OpenAI API, could be `openai` or `azure`.                              | `openai`                                                                               |h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$0cf30270-8d55-4d03-85bf-f84da72107e7h
)}(h
}(h| `llm.api_version`                             | The version of the OpenAI API.                                                         | `2023-07-01-preview`                                                                   |h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$b124c926-2f2e-451b-9002-cae62c89bc3dh
)}(h
}(hX  | `llm.response_format`                         | The response format of the OpenAI API, could be `json_object`, `text` or `null`.       | `json_object`                                                                          |
| `llm.embedding_api_type`                      | The type of the embedding API                                                          | `sentence_transformers`                                                                |h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$433839ed-492c-4c8f-948d-5c0a8ddcc1b1h
)}(h
}(hX  | `llm.embedding_model`                         | The name of the embedding model                                                        | `all-mpnet-base-v2`                                                      |
| `code_interpreter.code_verification_on`       | Whether to enable code verification.                                                   | `false`                                                                                |h}(hhhhhKuhh	uh(hhh}h}(hhhhusub$53a0d149-7937-449f-a8c2-52ec9d19f064h
)}(h
}(hX  | `code_interpreter.allowed_modules`            | The list of allowed modules to import in code generation.                              | `["pandas", "matplotlib", "numpy", "sklearn", "scipy", "seaborn", "datetime", "typing"]`, if the list is empty, no modules would be allowed|h}(hhhhhKuhh	uh(hhh}h}(hj  hj  usub$43e42cf0-59a4-4d31-be51-c70018fd6549h
)}(h
}(h| `code_interpreter.blocked_functions`          | The list of functions to block from code generation.                                   | `["eval", "exec", "execfile", "compile", "open", "input", "raw_input", "reload"]` |h}(hhhhhK	uhh	uh(hhh}h}(hj  hj  usub$8c4fe3b0-af03-4d9c-a8c1-296ce8a5ca3fh
)}(h
}(hX  | `logging.log_file`                            | The name of the log file.                                                              | `taskweaver.log`                                                                       |
| `logging.log_folder`                          | The folder to store the log file.                                                      | `logs`                                                                                 |h}(hhhhhK
uhh	uh(hhh}h}(hj  hj  usub$afd72441-c48c-41df-ad5f-484d18658de9h
)}(h
}(hX  | `plugin.base_path`                            | The folder to store plugins.                                                           | `${AppBaseDir}/plugins`                                                                |
| `planner.example_base_path`                   | The folder to store planner examples.                                                  | `${AppBaseDir}/planner_examples`                                                       |h}(hhhhhKuhh	uh(hhh}h}(hj!  hj"  usub$01ac08cc-eaef-4fd7-813a-8ec7161fc753h
)}(h
}(hX  | `planner.prompt_compression`                  | Whether to compress the chat history for planner.                                      | `false`                                                                                | 
| `planner.skip_planning`                       | Whether to skip LLM planning process and enable the default plan                       | `false`                                                                                |h}(hhhhhKuhh	uh(hhh}h}(hj*  hj+  usub$78f3e63b-45d3-45df-9a3e-8b0417a9ba0ah
)}(h
}(hX  | `planner.use_experience`                      | Whether to use experience summarized from the previous chat history in planner.        | `false`                                                                                |
| `code_generator.example_base_path`            | The folder to store code interpreter examples.                                         | `${AppBaseDir}/codeinterpreter_examples`                                               |h}(hhhhhK
uhh	uh(hhh}h}(hj3  hj4  usub$e9de5fac-f499-4c98-844b-bf5abaffb9f4h
)}(h
}(hX  | `code_generator.prompt_compression`           | Whether to compress the chat history for code interpreter.                             | `false`                                                                                |
| `code_generator.enable_auto_plugin_selection` | Whether to enable auto plugin selection.                                               | `false`                                                                                |h}(hhhhhKuhh	uh(hhh}h}(hj<  hj=  usub$249766f1-68ec-4e30-9868-bcc73df9be0ah
)}(h
}(hX  | `code_generator.use_experience`               | Whether to use experience summarized from the previous chat history in code generator. | `false`                                                                                |                      
| `code_generator.auto_plugin_selection_topk`   | The number of auto selected plugins in each round.                                     | `3`                                                                                    |h}(hhhhhKuhh	uh(hhh}h}(hjE  hjF  usub$2dfbbf89-ce90-4db8-836b-ba493f44809ah
)}(h
}(hX  | `session.max_internal_chat_round_num`         | The maximum number of internal chat rounds between Planner and Code Interpreter.       | `10`                                                                                   |
| `session.code_interpreter_only`               | Allow users to directly communicate with the Code Interpreter.                         | `false`                                                                                |h}(hhhhhKuhh	uh(hhh}h}(hjN  hjO  usub$6a5ec14c-823c-4cbd-bc4d-a1a3f943fd8eh
)}(h
}(hX  | `round_compressor.rounds_to_compress`         | The number of rounds to compress.                                                      | `2`                                                                                    |
| `round_compressor.rounds_to_retain`           | The number of rounds to retain.                                                        | `3`                                                                                    |
h}(hhhhhKuhh	uh(hhh}h}(hjW  hjX  usub$8abfe8bc-6468-4c34-801f-ccbd5144be5eh
)}(h
}(h1> 💡 $\{AppBaseDir\} is the project directory.
h}(hhhhhKuhh	uh(hhh}h}(hj`  hja  usub$3559d604-f198-4e3b-8a31-5624d868b713h
)}(h
}(h> 💡 Up to 11/30/2023, the `json_object` and `text` options of `llm.response_format` is only supported by the OpenAI models later than 1106. If you are using an older version of OpenAI model, you need to set the `llm.response_format` to `null`.h}(hhhhhKuhh	uh(hhh}h}(hji  hjj  usub$c7be91ee-31c8-4403-aac5-4d5f8a3b5950h
)}(h
}(h
> 💡 Read [this](compression.md) for more information for `planner.prompt_compression` and `code_generator.prompt_compression`.h}(hhhhhKuhh	uh(hhh}h}(hjr  hjs  usub$c24792db-9c82-4b5a-a176-8601d0e30acbh
)}(h
}(h# Experience

We build up a long-term memory for TaskWeaver, so that we can learn from the previous experience and improve our future performance.
It can summarize the mistakes it made in the past and help us avoid making the same mistakes again.h}(hdocs\experience.mdh# ExperiencehK uhh	uh(hhh}h}(hj{  hj|  usub$1ffce7f3-8496-4248-827e-8612edb75300h
)}(h
}(hX  In addition, it also can remember the user preferences and use them to guide the future conversation.

## How to use

1. To enable the experience feature, you only need to set the `planner.use_experience` and `code_generator.use_experience` parameter in the configuration file to `true`.h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$d656e971-cace-4b3c-912a-6286eec66f36h
)}(h
}(h2. Start a new conversation with TaskWeaver. You will find `experience` directory is created in your project directory. Note that there is no experience now because we have not saved any chat history yet.h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$149fabf8-17d1-4bef-8ee7-ba629ebca523h
)}(h
}(h3. If you think the current chat history is worth saving, you can save it by typing command `/save` and you will find a new file named `raw_exp_{session_id}.yaml` is created in the `experience` directory. h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$b6adf123-f1da-4286-a041-844c96d1d4e9h
)}(h
}(hXG  4. Restart TaskWeaver and start a new conversation. In the initialization stage, TaskWeaver will read the `raw_exp_{session_id}.yaml` file and make a summarization in a new file named `All_exp_{session_id}.yaml`. This process may take a while. `All_` denotes that this experience will be loaded for Planner and CodeInterpreter.h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$320fbf52-ebab-4003-97bd-5f1ed60ac7b8h
)}(h
}(h5. When user send a similar query to TaskWeaver, it will retrieve the relevant experience and load it into the system prompt (for Planner and CodeInterpreter). In this way, the experience can be used to guide the future conversation.

## Example
h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$cea964ad-92d7-45c2-882a-56d4c2c3f707h
)}(h
}(h1. Start a new conversation with TaskWeaver. Send a request "calculate the mean value of ./project/sample_data/data_demo.csv" to TaskWeaver.h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$c7e43778-8df8-4006-a178-e1c2c8b287c4h
)}(h
}(hX  2. You will find TaskWeaver made a mistake, calculating the mean value of the non-numerical column. After several rounds of conversation, TaskWeaver just fixed this issue.
```
=========================================================
 ___         _     _       __h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$37f34ee9-841a-4527-bb4b-2cedde7ccfb9h
)}(h
}(hn|_   _|_ _ ___| | _ | |     / /__  ___ __   ___  ___
  | |/ _` / __| |/ /| | /| / / _ \/ __ `/ | / / _ \/ ___/h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$ea6c4ba8-0541-4c2f-ae18-914ceb052b99h
)}(h
}(h  | | (_| \__ \   < | |/ |/ /  __/ /_/ /| |/ /  __/ /
  |_|\__,_|___/_|\_\|__/|__/\___/\__,_/ |___/\___/_/
=========================================================h}(hj}  hj~  hK	uhh	uh(hhh}h}(hj  hj  usub$5ffcbf04-87a1-46ce-b91b-c494a59a9d21h
)}(h
}(h TaskWeaver ▶  I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
    Human   ▶   calculate the mean value of d:/demo_data.csv
 ╭───< Planner >
 ├─► [init_plan]h}(hj}  hj~  hK
uhh	uh(hhh}h}(hj  hj  usub$99516ee4-e06f-46c8-a398-55c4d583cdbah
)}(h
}(h │   1. load the data file
 │   2. calculate the mean value of the loaded data <sequentially depends on 1>
 │   3. report the result to the user <interactively depends on 2>
 ├─► [plan]h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$72a7de79-d12a-41b8-9484-433d5b229dc2h
)}(h
}(hX   │   1. instruct CodeInterpreter to load the data file and calculate the mean value of the loaded data
 │   2. report the result to the user
 ├─► [current_plan_step] 1. instruct CodeInterpreter to load the data file and calculate the mean value of the loaded datah}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$bc38dd80-45b2-48e5-b5b8-69d4be449e9fh
)}(h
}(h ├──● Please load the data file d:/demo_data.csv and calculate the mean value of the loaded data
 ╰──● sending message to CodeInterpreter
 ╭───< CodeInterpreter >h}(hj}  hj~  hK
uhh	uh(hhh}h}(hj  hj  usub$cccea951-5985-41a3-9dab-71bc95ec8939h
)}(h
}(hX   ├─► [thought] ProgramApe will generate a code snippet to load the data file d:/demo_data.csv and calculate the mean value of the loaded data using pandas.
 ├─► [python]
 │   import pandas as pd
 │   
 │   data_file_path = "d:/demo_data.csv"h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$0a2f553d-eeee-46c9-8958-41acc697dbc6h
)}(h
}(h │   data_df = pd.read_csv(data_file_path)
 │   mean_value = data_df.mean()
 │   mean_value
 ├─► [verification] NONE
 ├─► [code_error] No code verification is performed.
 ├─► [execution_status] FAILURE
 ├─► [execution_result]h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$6b611d53-b10a-4eee-9f57-e64d29dccc47h
)}(h
}(hX   │   The execution of the generated python code above has failed
 │   
 │   During execution, the following messages were logged:
 │   Traceback (most recent call last):
 │   
 │     Cell In[1], line 5
 │       mean_value = data_df.mean()
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hj
  hj  usub$91761632-69ac-43a5-a1b4-985a4b41db35h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\frame.py:11335 in mean
 │       result = super().mean(axis, skipna, numeric_only, **kwargs)
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$643d4c4b-6d18-4748-b412-8992ab7128a6h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\generic.py:11984 in mean
 │       return self._stat_function(
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj   usub$1489c47d-b316-4c5d-bb2a-c5c1a39e18a7h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\generic.py:11941 in _stat_function
 │       return self._reduce(
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hj(  hj)  usub$0406db87-c1b9-41fd-a462-6dc4bceefa15h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\frame.py:11204 in _reduce
 │       res = df._mgr.reduce(blk_func)
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hj1  hj2  usub$75a92953-77c4-4750-8296-b26d59cda272h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\internals\managers.py:1459 in reduce
 │       nbs = blk.reduce(func)
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hj:  hj;  usub$78f39d7b-1f01-4a63-9ff8-ea9deab16ec9h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\internals\blocks.py:377 in reduce
 │       result = func(self.values)
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hjC  hjD  usub$b198924f-4857-4dfb-9f7a-6e193ec82719h
)}(h
}(hX   │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\frame.py:11136 in blk_func
 │       ret
 │   
 ├─► [artifact_paths] []
 ├──● The following python code has been executed:
 │   ```python
 │   import pandas as pdh}(hj}  hj~  hKuhh	uh(hhh}h}(hjL  hjM  usub$d168546c-eba4-4f48-a057-bebfca8e829eh
)}(h
}(h │   
 │   data_file_path = "d:/demo_data.csv"
 │   data_df = pd.read_csv(data_file_path)
 │   mean_value = data_df.mean()
 │   mean_value
 │   ```
 │   
 │   
 │   The execution of the generated python code above has failed
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hjU  hjV  usub$061b3511-d784-4e2f-8c14-d35bb3d5a297h
)}(h
}(h │   During execution, the following messages were logged:
 │   Traceback (most recent call last):
 │   
 │     Cell In[1], line 5
 │       mean_value = data_df.mean()
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hj^  hj_  usub$51fa7e08-a315-4c68-a613-d44b6e3e02c2h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\frame.py:11335 in mean
 │       result = super().mean(axis, skipna, numeric_only, **kwargs)
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hjg  hjh  usub$42f5fb53-1dcf-4e43-b2b3-85f1600bda58h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\generic.py:11984 in mean
 │       return self._stat_function(
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hjp  hjq  usub$36758219-9ace-4eac-aa5f-9ec17b0d4926h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\generic.py:11941 in _stat_function
 │       return self._reduce(
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hjy  hjz  usub$9b2dd5c7-4d9b-4830-8e8d-cf3e42d1508ah
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\frame.py:11204 in _reduce
 │       res = df._mgr.reduce(blk_func)
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$df1fd9a5-b3f9-490e-8d0e-66a0ad2b49b5h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\internals\managers.py:1459 in reduce
 │       nbs = blk.reduce(func)
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$49e72b32-bcb2-4cfe-a22c-d7e7c8000188h
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\internals\blocks.py:377 in reduce
 │       result = func(self.values)
 │   h}(hj}  hj~  hKuhh	uh(hhh}h}(hj  hj  usub$d10854f4-a813-4931-b31c-ab6620f84c5eh
)}(h
}(h │     File D:\python_project\Octopus\venv\Lib\site-packages\pandas\core\frame.py:11136 in blk_func
 │       ret
 │   
 ├─► [revise_message]h}(hj}  hj~  hK uhh	uh(hhh}h}(hj  hj  usub$a0753c66-1b62-48d7-ba37-a3dc501d7aech
)}(h
}(hX(   │   The execution of the previous generated code has failed. If you think you can fix the problem by rewriting the code, please generate code and run it again.
 │   Otherwise, please explain the problem to me.
 ╰──● sending message to CodeInterpreter
 ╭───< CodeInterpreter >h}(hj}  hj~  hK!uhh	uh(hhh}h}(hj  hj  usub$c4b442f7-aa1e-448f-8e34-ddf77eb01f6eh
)}(h
}(hX   ├─► [thought] ProgramApe will generate a code snippet to load the data file d:/demo_data.csv and calculate the mean value of the loaded data using pandas, considering the execution 
 │   error.
 ├─► [python]
 │   import pandas as pd
 │   h}(hj}  hj~  hK"uhh	uh(hhh}h}(hj  hj  usub$b16f34ce-4453-43cd-aaec-6baea5121495h
)}(h
}(h │   data_file_path = "d:/demo_data.csv"
 │   data_df = pd.read_csv(data_file_path)
 │   mean_value = data_df.mean(axis=0, numeric_only=True)
 │   mean_value
 ├─► [verification] NONEh}(hj}  hj~  hK#uhh	uh(hhh}h}(hj  hj  usub$17580d83-c47a-4a38-a03f-91e43a69edc5h
)}(h
}(hX    ├─► [code_error] No code verification is performed.
 ├─► [execution_status] SUCCESS
 ├─► [execution_result]
 │   The execution of the generated python code above has succeeded
 │   
 │   The result of above Python code after execution is:
 │   Count    78172.75h}(hj}  hj~  hK$uhh	uh(hhh}h}(hj  hj  usub$c8232c4d-a264-436c-865c-a401ecc3d135h
)}(h
}(h │   dtype: float64
 ├─► [artifact_paths] []
 ├──● The following python code has been executed:
 │   ```python
 │   import pandas as pd
 │   
 │   data_file_path = "d:/demo_data.csv"
 │   data_df = pd.read_csv(data_file_path)h}(hj}  hj~  hK%uhh	uh(hhh}h}(hj  hj  usub$61e6f196-4f31-4962-9f22-c7a2698c2942h
)}(h
}(hX
   │   mean_value = data_df.mean(axis=0, numeric_only=True)
 │   mean_value
 │   ```
 │   
 │   
 │   The execution of the generated python code above has succeeded
 │   
 │   The result of above Python code after execution is:
 │   Count    78172.75h}(hj}  hj~  hK&uhh	uh(hhh}h}(hj  hj  usub$58071c7e-b6b0-4cb1-9282-2553b2b760b6h
)}(h
}(h │   dtype: float64
 ╰──● sending message to Planner
 ╭───< Planner >
 ├─► [init_plan]
 │   1. load the data file
 │   2. calculate the mean value of the loaded data <sequentially depends on 1>h}(hj}  hj~  hK'uhh	uh(hhh}h}(hj  hj  usub$6a3f200c-3f67-4dbd-be45-06427b196666h
)}(h
}(h │   3. report the result to the user <interactively depends on 2>
 ├─► [plan]
 │   1. instruct CodeInterpreter to load the data file and calculate the mean value of the loaded data
 │   2. report the result to the userh}(hj}  hj~  hK(uhh	uh(hhh}h}(hj  hj  usub$d8aac5e5-6509-4c88-b04d-bfb7f615c859h
)}(h
}(h ├─► [current_plan_step] 2. report the result to the user
 ├──● The mean value of the data in d:/demo_data.csv is 78172.75
 ╰──● sending message to Userh}(hj}  hj~  hK)uhh	uh(hhh}h}(hj  hj  usub$b0274d47-1012-4c53-9bc0-46c236e81bc5h
)}(h
}(h TaskWeaver ▶  The mean value of the data in d:/demo_data.csv is 78172.75
```
3. We `/save` the chat history and restart TaskWeaver. We can see the experience summarized from the last chat history is below:
```yamlh}(hj}  hj~  hK*uhh	uh(hhh}h}(hj  hj  usub$66c494df-50b4-4f27-bb45-5eac96684b18h
)}(h
}(hyexp_id: 20231228-073922-9b411afd
experience_text: "User Query: get the mean value of d:/demo_data.csv
  Error Resolution:h}(hj}  hj~  hK+uhh	uh(hhh}h}(hj   hj  usub$4f54b027-875a-4841-8132-53c0e25dde06h
)}(h
}(hX    - Error 1: The initial code failed to execute because it tried to calculate the mean value of all columns, including non-numeric columns.
  - Resolution 1: The code was revised to check for non-numeric columns and only calculate the mean value for numeric columns.
  Preferences:h}(hj}  hj~  hK,uhh	uh(hhh}h}(hj	  hj
  usub$7de1ad59-2894-4a39-b25d-fc28ce302b77h
)}(h
}(h  - Preference 1: The user prefers to calculate the mean value of the "Count" column in the data."
raw_experience_path: D:\project\experience\raw_exp_20231228-073922-9b411afd.yaml
embedding_model: text-embedding-ada-002
embedding: ...
```h}(hj}  hj~  hK-uhh	uh(hhh}h}(hj  hj  usub$9b1b5e1e-b054-46bc-ba87-4874482c3c28h
)}(h
}(hXX  4. Send the similar request "calculate the variance value of ./project/sample_data/data_demo.csv" to TaskWeaver. You will find TaskWeaver will not make the same mistake again. It will ask User to confirm the column name to calculate the variance value.
```
=========================================================
 ___         _     _       __h}(hj}  hj~  hK.uhh	uh(hhh}h}(hj  hj  usub$f2f79990-1072-40f4-b7ea-1ad6b11b6823h
)}(h
}(hn|_   _|_ _ ___| | _ | |     / /__  ___ __   ___  ___
  | |/ _` / __| |/ /| | /| / / _ \/ __ `/ | / / _ \/ ___/h}(hj}  hj~  hK/uhh	uh(hhh}h}(hj$  hj%  usub$fb62afca-ab55-4310-8d61-f632f9f78d14h
)}(h
}(h  | | (_| \__ \   < | |/ |/ /  __/ /_/ /| |/ /  __/ /
  |_|\__,_|___/_|\_\|__/|__/\___/\__,_/ |___/\___/_/
=========================================================h}(hj}  hj~  hK0uhh	uh(hhh}h}(hj-  hj.  usub$7e2e88ae-840b-49cc-9a8f-498a8ae9a132h
)}(h
}(h TaskWeaver ▶  I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
    Human   ▶  calculate the variance value of d:/demo_data.csv
 ╭───< Planner >TaskWeaver] preparing     <=�=>     h}(hj}  hj~  hK1uhh	uh(hhh}h}(hj6  hj7  usub$601b3663-8de7-47c9-83e3-f733179bff10h
)}(h
}(h ├─► [init_plan]
 │   1. load the data file
 │   2. confirm the columns to calculate variance <interactively depends on 1>
 │   3. calculate the variance value for the selected columns <sequentially depends on 2>h}(hj}  hj~  hK2uhh	uh(hhh}h}(hj?  hj@  usub$321bd17e-abb0-4b6b-8cec-bf1512a9cf46h
)}(h
}(h │   4. report the result to the user <interactively depends on 3>
 ├─► [plan] ▶ [Planner][plan] 1. instruct CodeInterpreter to load the data file 2. confirm the columns to calcu    <=�=>      h}(hj}  hj~  hK3uhh	uh(hhh}h}(hjH  hjI  usub$e487c021-508f-4d25-b56e-527e5b8b7d45h
)}(h
}(h │   1. instruct CodeInterpreter to load the data file
 │   2. confirm the columns to calculate variance
 ├─► [current_plan_step] 1. instruct CodeInterpreter to load the data file load the data file        <=�=>  h}(hj}  hj~  hK4uhh	uh(hhh}h}(hjQ  hjR  usub$0205701e-2b17-4bbb-943a-890527703bc4h
)}(h
}(h ├──● Please load the data file d:/demo_data.csva file d:/demo_dat <=�=>         
 ╰──● sending message to CodeInterpreter
 ╭───< CodeInterpreter >h}(hj}  hj~  hK5uhh	uh(hhh}h}(hjZ  hj[  usub$8a1ef86e-3cc0-43ba-bf1e-7dc18ef112f2h
)}(h
}(h ├─► [thought] ProgramApe will generate a code snippet to load the data file d:/demo_data.csv using pandas.demo_data.csv using panda     <=�=>     h}(hj}  hj~  hK6uhh	uh(hhh}h}(hjc  hjd  usub$5084b0d6-af90-4e4d-99cf-a6323830fd67h
)}(h
}(h ├─► [python]▶ [CodeInterpreter][python] import pandas as pd  data_file_path = "d:/demo_data.csv" df = pd.read_csv(data_fil      <=�=>    
 │   import pandas as pd
 │   
 │   data_file_path = "d:/demo_data.csv"h}(hj}  hj~  hK7uhh	uh(hhh}h}(hjl  hjm  usub$1655760c-3e13-4c23-a033-139213db8f85h
)}(h
}(h │   df = pd.read_csv(data_file_path)
 │   df
 ├─► [verification] NONE
 ├─► [code_error] No code verification is performed.
 ├─► [execution_status] SUCCESS] executing code       <=�=>   
 ├─► [execution_result]h}(hj}  hj~  hK8uhh	uh(hhh}h}(hju  hjv  usub$fd8ab96c-7f3d-4ab9-845e-007dfcd965ddh
)}(h
}(h │   The execution of the generated python code above has succeeded
 │   
 │   The result of above Python code after execution is:
 │                  TimeBucket  Count
 │   0    2023-02-01T00:00:00Z  67814h}(hj}  hj~  hK9uhh	uh(hhh}h}(hj~  hj  usub$0032cab5-8907-4042-a134-02ee13769b2fh
)}(h
}(hw │   1    2023-02-01T04:00:00Z  84569
 │   ..                    ...    ...
 │   718  2023-05-31T16:00:00Z  89912h}(hj}  hj~  hK:uhh	uh(hhh}h}(hj  hj  usub$33fbae6d-4eac-4aaa-8964-7dc852f466cch
)}(h
}(h │   719  2023-05-31T20:00:00Z  87249
 │   
 │   [720 rows x 2 columns]
 ├─► [artifact_paths] []
 ├──● The following python code has been executed:
 │   ```python
 │   import pandas as pdh}(hj}  hj~  hK;uhh	uh(hhh}h}(hj  hj  usub$9e7e68f8-46a1-40c1-bb99-63fbc4cc113ah
)}(h
}(hX   │   
 │   data_file_path = "d:/demo_data.csv"
 │   df = pd.read_csv(data_file_path)
 │   df
 │   ```
 │   
 │   
 │   The execution of the generated python code above has succeeded
 │   
 │   The result of above Python code after execution is:h}(hj}  hj~  hK<uhh	uh(hhh}h}(hj  hj  usub$28f65e51-442f-4ea0-99ab-9a7092973ae6h
)}(h
}(h │                  TimeBucket  Count
 │   0    2023-02-01T00:00:00Z  67814
 │   1    2023-02-01T04:00:00Z  84569
 │   ..                    ...    ...h}(hj}  hj~  hK=uhh	uh(hhh}h}(hj  hj  usub$9f2da13b-f567-4adf-9661-1e43bff7de36h
)}(h
}(hu │   718  2023-05-31T16:00:00Z  89912
 │   719  2023-05-31T20:00:00Z  87249
 │   
 │   [720 rows x 2 columns]h}(hj}  hj~  hK>uhh	uh(hhh}h}(hj  hj  usub$1febf3e6-0672-4c18-9ec2-7edeae636507h
)}(h
}(h ╰──● sending message to Planner
 ╭───< Planner >CodeInterpreter] finished  <=�=>        
 ├─► [init_plan]
 │   1. load the data file
 │   2. confirm the columns to calculate variance <interactively depends on 1>h}(hj}  hj~  hK?uhh	uh(hhh}h}(hj  hj  usub$2660b7b1-95f0-4e5b-a88f-f2c14077fd39h
)}(h
}(h │   3. calculate the variance value for the selected columns <sequentially depends on 2>
 │   4. report the result to the user <interactively depends on 3>h}(hj}  hj~  hK@uhh	uh(hhh}h}(hj  hj  usub$509b180b-48e3-444f-9c3a-e246452a2944h
)}(h
}(h ├─► [plan] ▶ [Planner][plan] 1. instruct CodeInterpreter to load the data file 2. confirm the columns to calculate var       <=�=>   
 │   1. instruct CodeInterpreter to load the data file
 │   2. confirm the columns to calculate varianceh}(hj}  hj~  hKAuhh	uh(hhh}h}(hj  hj  usub$bf29845d-293a-4e8b-9f19-ef0bf900c69ch
)}(h
}(h ├─► [current_plan_step] 2. confirm the columns to calculate variance calculate varia      <=�=>    
 ├──● The data file d:/demo_data.csv is loaded. The columns in the data are:
 │   
 │   1. TimeBucket
 │   2. Count
 │   h}(hj}  hj~  hKBuhh	uh(hhh}h}(hj  hj  usub$6d5724fe-7cff-4076-a089-bdda1c3f36e2h
)}(h
}(h │   Please specify which column(s) you would like to calculate the variance for.
 ╰──● sending message to User
 TaskWeaver ▶  The data file d:/demo_data.csv is loaded. The columns in the data are:

1. TimeBucket
2. Count
h}(hj}  hj~  hKCuhh	uh(hhh}h}(hj  hj  usub$75ce5529-7ed6-4163-9c51-986d646e337ah
)}(h
}(hPlease specify which column(s) you would like to calculate the variance for.

Human   ▶  Count
 ╭───< Planner >TaskWeaver] preparing    <=�=>      
 ├─► [init_plan]
 │   1. load the data fileh}(hj}  hj~  hKDuhh	uh(hhh}h}(hj  hj  usub$8d100afa-961d-457d-b872-ca6aecdf1e91h
)}(h
}(h │   2. confirm the columns to calculate variance <interactively depends on 1>
 │   3. calculate the variance value for the selected columns <sequentially depends on 2>
 │   4. report the result to the user <interactively depends on 3>h}(hj}  hj~  hKEuhh	uh(hhh}h}(hj  hj  usub$53161019-14eb-46f8-8396-69717193dbcdh
)}(h
}(h ├─► [plan]
 │   1. instruct CodeInterpreter to load the data file
 │   2. confirm the columns to calculate variance
 │   3. instruct CodeInterpreter to calculate the variance value for the Count column
 │   4. report the result to the userh}(hj}  hj~  hKFuhh	uh(hhh}h}(hj  hj  usub$bb10d0c7-6956-414d-9609-57fd782d0d1eh
)}(h
}(hX
   ├─► [current_plan_step] 3. instruct CodeInterpreter to calculate the variance value for the Count columnor the Count col  <=�=>        
 ├──● Please calculate the variance value for the Count column in the loaded datalumn in the loaded          <=�=>h}(hj}  hj~  hKGuhh	uh(hhh}h}(hj  hj  usub$883b425f-757a-402f-9264-e997072fa078h
)}(h
}(hX   ╰──● sending message to CodeInterpreter
 ╭───< CodeInterpreter >
 ├─► [thought] ProgramApe will generate a code snippet to calculate the variance value for the Count column in the loaded data. column in the loaded dat <=�=>         h}(hj}  hj~  hKHuhh	uh(hhh}h}(hj  hj  usub$4c857b7a-3514-4969-a984-88ef09d915fdh
)}(h
}(h ├─► [python]▶ [CodeInterpreter][python] variance_count = df['Count'].var() varia     <=�=>     
 │   variance_count = df['Count'].var()
 │   variance_count
 ├─► [verification] NONEh}(hj}  hj~  hKIuhh	uh(hhh}h}(hj  hj  usub$18dab894-7ad5-426c-8f95-8cea8574c5d7h
)}(h
}(hX(   ├─► [code_error] No code verification is performed.
 ├─► [execution_status] SUCCESS] executing code        <=�=>  
 ├─► [execution_result]
 │   The execution of the generated python code above has succeeded
 │   
 │   The result of above Python code after execution is:h}(hj}  hj~  hKJuhh	uh(hhh}h}(hj  hj  usub$fa02b9d2-f390-471d-a288-050595a80a6ch
)}(h
}(h │   263514550.45479834
 ├─► [artifact_paths] []
 ├──● The following python code has been executed:
 │   ```python
 │   variance_count = df['Count'].var()
 │   variance_count
 │   ```
 │   
 │   h}(hj}  hj~  hKKuhh	uh(hhh}h}(hj   hj!  usub$fa4c46ff-736c-4f28-b3f9-f7266f0a0f2bh
)}(h
}(hX   │   The execution of the generated python code above has succeeded
 │   
 │   The result of above Python code after execution is:
 │   263514550.45479834
 ╰──● sending message to Planner
 ╭───< Planner >CodeInterpreter] finished         <=�=> h}(hj}  hj~  hKLuhh	uh(hhh}h}(hj)  hj*  usub$1cbb8b46-6f76-4ada-8b2d-f1ed92077592h
)}(h
}(h ├─► [init_plan]
 │   1. load the data file
 │   2. confirm the columns to calculate variance <interactively depends on 1>
 │   3. calculate the variance value for the selected columns <sequentially depends on 2>h}(hj}  hj~  hKMuhh	uh(hhh}h}(hj2  hj3  usub$4763cee4-b425-4e44-81de-a5e838a9496fh
)}(h
}(h │   4. report the result to the user <interactively depends on 3>
  ├─► [plan]
 │   1. instruct CodeInterpreter to load the data file
 │   2. confirm the columns to calculate varianceh}(hj}  hj~  hKNuhh	uh(hhh}h}(hj;  hj<  usub$8041aaa7-5ae4-40b4-a493-1c4c5d1f8275h
)}(h
}(h │   3. instruct CodeInterpreter to calculate the variance value for the Count column
 │   4. report the result to the user
 ├─► [current_plan_step] 4. report the result to the user result to the u       <=�=>   h}(hj}  hj~  hKOuhh	uh(hhh}h}(hjD  hjE  usub$0de8d32f-9caf-45e3-984e-3f362cd4e0bfh
)}(h
}(h ├──● The variance value for the Count column in the data file d:/demo_data.csv is 263,514,550.45a.csv is 263,51    <=�=>      
 ╰──● sending message to Userh}(hj}  hj~  hKPuhh	uh(hhh}h}(hjM  hjN  usub$c8f51273-56c0-49fe-b448-f7c92298c14ch
)}(h
}(hp TaskWeaver ▶  The variance value for the Count column in the data file d:/demo_data.csv is 263,514,550.45
```h}(hj}  hj~  hKQuhh	uh(hhh}h}(hjV  hjW  usub$455bec15-d71b-4285-9da9-905243a24d4fh
)}(h
}(h# Overview
<!-- <div>
<h1 style={{display: flex, padding: 0 20px}}>
<img 
  src={require('../static/img/logo_light.png').default}
  alt="Logo"
  style={{width: 80, height: 80, display: "inline"}}
/>h}(hdocs\overview.mdh
# OverviewhK uhh	uh(hhh}h}(hj_  hj`  usub$360c2ff9-1d51-4638-a536-cdde8ba5f38ah
)}(h
}(hX  <span style={{lineHeight: "80px"}}>TaskWeaver</span>
</h1> -->
<!-- </div> -->

**TaskWeaver** is a *code-first* agent framework for seamlessly planning and executing data analytics tasks. 
This innovative framework interprets user requests through coded snippets and efficiently h}(hja  hjb  hKuhh	uh(hhh}h}(hjj  hjk  usub$ccad785e-1761-4adc-840a-bd2c242c04f7h
)}(h
}(hX#  coordinates a variety of plugins in the form of functions to execute 
data analytics tasks

**Highlighted Features**

- [x] **Rich data structure** - TaskWeaver allows you to work with rich data 
    structures in Python, such as DataFrames, instead of having to work with 
    text strings.h}(hja  hjb  hKuhh	uh(hhh}h}(hjs  hjt  usub$28cfc23e-196f-4fe5-a175-b1156c05aacfh
)}(h
}(h- [x] **Customized algorithms** - TaskWeaver allows you to encapsulate your 
    own algorithms into plugins (in the form of Python functions), 
    and orchestrate them to achieve complex tasks.h}(hja  hjb  hKuhh	uh(hhh}h}(hj|  hj}  usub$1dfb06f3-97eb-4b6b-9821-2e0eba1056feh
)}(h
}(h- [x] **Incorporating domain-specific knowledge** - TaskWeaver is designed to 
    be easily incorporating domain-specific knowledge, such as the knowledge 
    of execution flow, to improve the reliability of the AI copilot.h}(hja  hjb  hKuhh	uh(hhh}h}(hj  hj  usub$2e47f3ee-9748-4f9e-8288-d9264bb882e7h
)}(h
}(hX  - [x] **Stateful conversation** - TaskWeaver is designed to support stateful 
    conversation. It can remember the context of the conversation and 
    leverage it to improve the user experience.
- [x] **Code verification** - TaskWeaver is designed to verify the generated code h}(hja  hjb  hKuhh	uh(hhh}h}(hj  hj  usub$6783a7ec-03ab-410c-9e47-91efdb19ddbfh
)}(h
}(hX      before execution. It can detect potential issues in the generated code 
    and provide suggestions to fix them.
- [x] **Easy to use** - TaskWeaver is designed to be easy to use. 
    We provide a set of sample plugins and a tutorial to help you get started.h}(hja  hjb  hKuhh	uh(hhh}h}(hj  hj  usub$db00240e-1d98-47e3-8820-ec1c3b879215h
)}(h
}(h    Users can easily create their own plugins based on the sample plugins.
    TaskWeaver offers an open-box experience, allowing users to run a service immediately after installation.
- [x] **Easy to debug** - TaskWeaver is designed to be easy to debug. h}(hja  hjb  hKuhh	uh(hhh}h}(hj  hj  usub$0805f46b-0739-49f8-9543-b4989ae4462ch
)}(h
}(h    We have detailed logs to help you understand what is going on during calling the LLM, 
    the code generation, and execution process.
- [x] **Security consideration** - TaskWeaver supports a basic session management to keeph}(hja  hjb  hKuhh	uh(hhh}h}(hj  hj  usub$ffad2fcb-4f56-4a8b-99be-34939024db46h
)}(h
}(hX      different users' data separate. The code execution is separated into different processes in order not to interfere with each other.
- [x] **Easy extension** - TaskWeaver is designed to be easily extended to accomplish 
    more complex tasks. You can create multiple AI copilots toh}(hja  hjb  hK	uhh	uh(hhh}h}(hj  hj  usub$b852bdee-434b-40d7-8b67-b6909bcc693bh
)}(h
}(hJ    act in different roles, and orchestrate them to achieve complex tasks.h}(hja  hjb  hK
uhh	uh(hhh}h}(hj  hj  usub$05c348a6-f0f2-41e7-b5c7-32f6532ca38bh
)}(h
}(h# Planner

In TaskWeaver, the Planner is responsible for generating a plan to accomplish the user's task.
The plan is a sequence of steps, where each step will be executed by the Code Interpreter.h}(hdocs\planner.mdh	# PlannerhK uhh	uh(hhh}h}(hj  hj  usub$9941fe89-9723-47ae-92c1-25fe4c6ec53bh
)}(h
}(hX+  Taken the response from the Code Interpreter or new requests from the user as input, the Planner will update the plan and move on to the next step.

## Planner Configuration

- `planner.example_base_path`:	The folder to store planner examples. The default value is `${AppBaseDir}/planner_examples`. h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e63af09b-4a2e-415e-9635-5e03357b2b4ch
)}(h
}(hIf you want to create your own planner examples, you can add them to this folder. More details about `example` can referred to [example](./customization/example/example.md).h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$990865d7-59e5-485e-baad-d3a5b0e6d761h
)}(h
}(hX	  - `planner.prompt_compression`: At times, lengthy conversations with the Planner may exceed the input limitations of the LLM model. 
To address this issue, we can compress the chat history and send it to the LLM model. The default value for this setting is `false`.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$3899bb50-cabb-4b56-be50-8ea33a28f18bh
)}(h
}(hMore details about `prompt_compression` can be referred to [prompt_compression](./compression).
- `planner.skip_planning`: In certain scenarios, there may be no need to use the Planner to generate complex plans for simple tasks. h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$ce34913b-a057-4340-afaa-b5aa88086f89h
)}(h
}(hzFor instance, if a user wants to count the rows in a data file, the request can be sent directly to the Code Interpreter. h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$b9564fa1-7d45-489f-b60f-c7e257f282c3h
)}(h
}(hWhen the majority of user requests involve simple tasks, enabling this option will create a dummy plan that is sent alongside the user request to the Code Interpreter directly, without LLM generation process.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$0a34e444-12db-4b6b-9425-2d508eadc149h
)}(h
}(hX  The fixed dummy plan is shown in [dummy_plan.json](https://github.com/microsoft/TaskWeaver/blob/main/taskweaver/planner/dummy_plan.json).
Here is an chat example:
`````bash
=========================================================
 ___         _     _       __h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$bdf7ab04-9bed-4893-a494-b822170eb689h
)}(h
}(hn|_   _|_ _ ___| | _ | |     / /__  ___ __   ___  ___
  | |/ _` / __| |/ /| | /| / / _ \/ __ `/ | / / _ \/ ___/h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$cdb7294e-9f56-4bae-b39b-b17b908d1f49h
)}(h
}(h  | | (_| \__ \   < | |/ |/ /  __/ /_/ /| |/ /  __/ /
  |_|\__,_|___/_|\_\|__/|__/\___/\__,_/ |___/\___/_/
=========================================================h}(hj  hj  hK	uhh	uh(hhh}h}(hj  hj  usub$712e8ab0-6bc6-4c0d-bfeb-1c234d45f57fh
)}(h
}(hX  TaskWeaver: I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
Human: generate 10 random numbers
>>> [MESSAGE]eparing...           <=�=>
Please process this request: generate 10 random numbers
>>> [SEND_TO]
CodeInterpreterh}(hj  hj  hK
uhh	uh(hhh}h}(hj   hj!  usub$f7657e0a-54f0-4d68-836f-4f0e0f25e25eh
)}(h
}(h>>> [INIT_PLAN]
1. ask Code Interpreter to handle the request; 2. report the result to user <interactively depends on 1>
>>> [PLAN]
1. ask Code Interpreter to handle user\'s request; 2. report the result to user
>>> [CURRENT_PLAN_STEP]h}(hj  hj  hKuhh	uh(hhh}h}(hj)  hj*  usub$600bcbc4-d37d-442c-adc9-015604b54f14h
)}(h
}(hX  1. ask Code Interpreter to handle the request
>>> [PLANNER->CODEINTERPRETER]
Please process this request: generate 10 random numbers
>>> [PYTHON]Starting...      
random_numbers = np.random.rand(10)
random_numbers
>>> [VERIFICATION]
NONE
>>> [STATUS]Starting...         
SUCCESSh}(hj  hj  hKuhh	uh(hhh}h}(hj2  hj3  usub$c16be645-f5b6-4676-912e-127c4ba671e1h
)}(h
}(h>>> [RESULT]
The execution of the generated python code above has succeeded

The result of above Python code after execution is:
array([0.65294462, 0.26946084, 0.06244879, 0.78520418, 0.87067657,h}(hj  hj  hK
uhh	uh(hhh}h}(hj;  hj<  usub$2efb01be-a14e-466b-bd3e-d2ff1f0ff32dh
)}(h
}(h       0.24208003, 0.60249788, 0.30921069, 0.83811521, 0.05135891])
>>> [CODEINTERPRETER->PLANNER]
The following python code has been executed:
```pythonh}(hj  hj  hKuhh	uh(hhh}h}(hjD  hjE  usub$c6a95503-b017-4920-bb34-0ff5c94454b4h
)}(h
}(hrandom_numbers = np.random.rand(10)
random_numbers
```

The execution of the generated python code above has succeeded

The result of above Python code after execution is:h}(hj  hj  hKuhh	uh(hhh}h}(hjM  hjN  usub$86ad350a-df58-48f5-bdf2-6558a3d125d8h
)}(h
}(hBarray([0.65294462, 0.26946084, 0.06244879, 0.78520418, 0.87067657,h}(hj  hj  hKuhh	uh(hhh}h}(hjV  hjW  usub$8179f395-2cc8-4c82-b018-96b0924ec49dh
)}(h
}(hl       0.24208003, 0.60249788, 0.30921069, 0.83811521, 0.05135891])
>>> [INIT_PLAN]ting...      <=�=>     h}(hj  hj  hKuhh	uh(hhh}h}(hj_  hj`  usub$671fefdb-0129-443e-9db4-75118fe12f02h
)}(h
}(h1. ask Code Interpreter to handle the request; 2. report the result to user <interactively depends on 1>
>>> [PLAN]
1. ask Code Interpreter to handle user's request; 2. report the result to user
>>> [CURRENT_PLAN_STEP]
2. report the result to userh}(hj  hj  hKuhh	uh(hhh}h}(hjh  hji  usub$b86de484-c52a-42f6-9408-b6e3b51d1e8eh
)}(h
}(h >>> [SEND_TO]
User
>>> [MESSAGE]h}(hj  hj  hKuhh	uh(hhh}h}(hjq  hjr  usub$134c5d79-f62d-477c-b07b-5714fbcfbc54h
)}(h
}(hThe random numbers are as follows: [0.65294462, 0.26946084, 0.06244879, 0.78520418, 0.87067657, 0.24208003, 0.60249788, 0.30921069, 0.83811521, 0.05135891]h}(hj  hj  hKuhh	uh(hhh}h}(hjz  hj{  usub$45920e4d-6d1d-497f-9991-d256257c6f69h
)}(h
}(h>>> [PLANNER->USER]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$0b993da8-d21b-49cc-8d3b-a4a5b2896f5fh
)}(h
}(hThe random numbers are as follows: [0.65294462, 0.26946084, 0.06244879, 0.78520418, 0.87067657, 0.24208003, 0.60249788, 0.30921069, 0.83811521, 0.05135891]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e69c90e4-0965-4941-82ae-20bb9245fcbah
)}(h
}(hTaskWeaver: The random numbers are as follows: [0.65294462, 0.26946084, 0.06244879, 0.78520418, 0.87067657, 0.24208003, 0.60249788, 0.30921069, 0.83811521, 0.05135891]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$b6a41991-9347-415c-b108-f530ef5f7580h
)}(h
}(h`````h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e67f9aba-0663-4bd5-adf1-fcd6efe76e68h
)}(h
}(hX  # Quick Start

## Installation
You can install TaskWeaver by running the following command:
```bash
# [optional] create a conda environment to isolate the dependencies
# conda create -n taskweaver python=3.10
# conda activate taskweaver

# clone the repositoryh}(hdocs\quickstart.mdh
# Quick StarthK uhh	uh(hhh}h}(hj  hj  usub$8b49b2bc-5e21-42b1-b9b5-ea7f2620844bh
)}(h
}(hX  git clone https://github.com/microsoft/TaskWeaver.git
cd TaskWeaver
# install the requirements
pip install -r requirements.txt
```

## Project Directory
TaskWeaver runs as a process, you need to create a project directory to store plugins and configuration files. h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$5b3d890f-282e-4320-948a-2efe9c8075f6h
)}(h
}(hX  We provided a sample project directory in the `project` folder. You can copy the `project` folder to your workspace.
A project directory typically contains the following files and folders:

```bash
📦project
 ┣ 📜taskweaver_config.json # the configuration file for TaskWeaverh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$1bd380b6-ec1a-4028-a091-0ccdda0b394ah
)}(h
}(hX
   ┣ 📂plugins # the folder to store plugins
 ┣ 📂planner_examples # the folder to store planner examples
 ┣ 📂codeinterpreter_examples # the folder to store code interpreter examples
 ┣ 📂sample_data # the folder to store sample data used for evaluationsh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$fb5e72e0-019e-4202-b8a0-06e3feb157dah
)}(h
}(hX   ┣ 📂logs # the folder to store logs, will be generated after program starts
 ┗ 📂workspace # the directory stores session data， will be generated after program starts
    ┗ 📂 session_id 
      ┣ 📂ces # the folder used by the code execution serviceh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$9876c76e-219b-4855-a93f-eb10322ae25fh
)}(h
}(hX        ┗ 📂cwd # the current working directory to run the generated code
```

## OpenAI Configuration
Before running TaskWeaver, you need to provide your OpenAI API key and other necessary information. 
You can do this by editing the `taskweaver_config.json` file. h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$42212a69-d7cc-4fd5-a29a-4d07e1f20e2fh
)}(h
}(hIf you are using Azure OpenAI, you need to set the following parameters in the `taskweaver_config.json` file:
### Azure OpenAI
```json
{
"llm.api_base": "https://xxx.openai.azure.com/",
"llm.api_key": "your_api_key",h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$3f1e8c75-456f-4caf-9280-3b3406d9114bh
)}(h
}(h"llm.api_type": "azure",
"llm.api_version": "the api version",
"llm.model": "the model name, e.g., gpt-4"
}
```

### OpenAI
```json
{
"llm.api_key": "the api key",h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$0c736466-6413-45f3-ab53-3dab45e6970dh
)}(h
}(h~"llm.model": "the model name, e.g., gpt-4"
}
```
>💡 Only the latest OpenAI API supports the `json_object` response format. h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$5c510305-a0fc-46da-b7b1-abdde89fbf46h
)}(h
}(hX  > If you are using an older version of OpenAI API, you need to set the `llm.response_format` to `null`.

More configuration options can be found in the [configuration documentation](configurations.md).

## Start TaskWeaver
```bash
# assume you are in the taskweaver folderh}(hj  hj  hK	uhh	uh(hhh}h}(hj  hj  usub$b07456be-1aaf-4a50-a36b-7590d22cea9fh
)}(h
}(hX8  # -p is the path to the project directory
python -m taskweaver -p ./project/
```
This will start the TaskWeaver process and you can interact with it through the command line interface. 
If everything goes well, you will see the following prompt:

```bash
=========================================================h}(hj  hj  hK
uhh	uh(hhh}h}(hj  hj  usub$5cb4f098-fdb5-4614-9585-926cc6f1f041h
)}(h
}(h ___         _     _       __
|_   _|_ _ ___| | _ | |     / /__  ___ __   ___  ___
  | |/ _` / __| |/ /| | /| / / _ \/ __ `/ | / / _ \/ ___/h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj
  usub$3b7b8bff-7ecb-4447-af20-ec02234782f2h
)}(h
}(h  | | (_| \__ \   < | |/ |/ /  __/ /_/ /| |/ /  __/ /
  |_|\__,_|___/_|\_\|__/|__/\___/\__,_/ |___/\___/_/
=========================================================h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$015cecf6-46a7-4a84-aadf-9b0b62ab70a7h
)}(h
}(hqTaskWeaver: I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
Human: ___
```h}(hj  hj  hK
uhh	uh(hhh}h}(hj  hj  usub$d942d278-a55a-40ea-acfe-e916eb75b1b6h
)}(h
}(h# Running pytest

## quickstart

### setup

make sure you completed the [taskweaver quickstart](https://github.com/microsoft/TaskWeaver#quick-start)

### run tests

- execute `./scripts/run_pytest.sh` from the project root

### notes
h}(hdocs\run_pytest.mdh# Running pytesthK uhh	uh(hhh}h}(hj'  hj(  usub$1ba99f03-f9dc-4ed7-9aea-7a3cbb4db3c5h
)}(h
}(h- the script temporarily sets `PYTHONPATH` and runs pytest
- no permanent environment changes
- pass additional pytest arguments as needed

done! 🚀h}(hj)  hj*  hKuhh	uh(hhh}h}(hj2  hj3  usub$119c4d9c-8fc3-42f4-8b33-d7d049df3f18h
)}(h
}(hX  # Session

`session` is the entrance of TaskWeaver. 
It is responsible for the communication between the user and TaskWeaver.
You can refer to [taskweaver_as_a_lib](./usage/library.md) to see how to setup a TaskWeaver session and start chatting with TaskWeaver.
h}(hdocs\session.mdh	# SessionhK uhh	uh(hhh}h}(hj;  hj<  usub$80e53d3a-f3c7-4364-9e23-eb1a4628511fh
)}(h
}(hX  ## Session Configration
- `max_internal_chat_round_num`: the maximum number of internal chat rounds between Planner and Code Interpreter. 
  If the number of internal chat rounds exceeds this number, the session will be terminated. 
  The default value is `10`.h}(hj=  hj>  hKuhh	uh(hhh}h}(hjF  hjG  usub$87b60512-b41b-44c7-a82a-104ef2338f79h
)}(h
}(hXJ  - `code_interpreter_only`: allow users to directly communicate with the Code Interpreter.
   In this mode, users can only send messages to the Code Interpreter and receive messages from the Code Interpreter.
   Here is an example:
``````bash
 =========================================================
 ___         _     _       __h}(hj=  hj>  hKuhh	uh(hhh}h}(hjO  hjP  usub$c2761c19-378d-492c-9078-1be6fe3a611eh
)}(h
}(hn|_   _|_ _ ___| | _ | |     / /__  ___ __   ___  ___
  | |/ _` / __| |/ /| | /| / / _ \/ __ `/ | / / _ \/ ___/h}(hj=  hj>  hKuhh	uh(hhh}h}(hjX  hjY  usub$968d3095-d002-42b5-b11d-5745840c39a1h
)}(h
}(h  | | (_| \__ \   < | |/ |/ /  __/ /_/ /| |/ /  __/ /
  |_|\__,_|___/_|\_\|__/|__/\___/\__,_/ |___/\___/_/
=========================================================h}(hj=  hj>  hKuhh	uh(hhh}h}(hja  hjb  usub$f1f6f594-ed93-43a8-ae28-1472c68ccfa5h
)}(h
}(hTaskWeaver: I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
Human: generate 10 random numbers
>>> [PYTHON]Starting... 
import numpy as np
random_numbers = np.random.rand(10)
random_numbers
>>> [VERIFICATION]
NONEh}(hj=  hj>  hKuhh	uh(hhh}h}(hjj  hjk  usub$84d1c8ae-52b2-45b3-bcfb-c61d5e339b4ch
)}(h
}(h>>> [STATUS]Starting...         
SUCCESS
>>> [RESULT]
The execution of the generated python code above has succeeded

The result of above Python code after execution is:h}(hj=  hj>  hKuhh	uh(hhh}h}(hjs  hjt  usub$84ca77a4-78e5-4b3b-aafb-cac37485c891h
)}(h
}(hBarray([0.09918602, 0.68732778, 0.44413814, 0.4756623 , 0.48302334,h}(hj=  hj>  hKuhh	uh(hhh}h}(hj|  hj}  usub$1504077d-902f-48f8-8998-29edbe5028e5h
)}(h
}(h       0.8286594 , 0.80994359, 0.35677263, 0.45719317, 0.68240194])
>>> [CODEINTERPRETER->PLANNER]
The following python code has been executed:
```python
import numpy as nph}(hj=  hj>  hKuhh	uh(hhh}h}(hj  hj  usub$f0911757-893d-4b4e-9d5c-c12d46fb76abh
)}(h
}(hrandom_numbers = np.random.rand(10)
random_numbers
```

The execution of the generated python code above has succeeded

The result of above Python code after execution is:h}(hj=  hj>  hK	uhh	uh(hhh}h}(hj  hj  usub$7c52e04b-8739-4e1e-b5ec-4c8026e1605bh
)}(h
}(hBarray([0.09918602, 0.68732778, 0.44413814, 0.4756623 , 0.48302334,h}(hj=  hj>  hK
uhh	uh(hhh}h}(hj  hj  usub$82202a18-d1de-4711-8888-7e7bc885dadah
)}(h
}(h       0.8286594 , 0.80994359, 0.35677263, 0.45719317, 0.68240194])
TaskWeaver: The following python code has been executed:
```python
import numpy as np
random_numbers = np.random.rand(10)h}(hj=  hj>  hKuhh	uh(hhh}h}(hj  hj  usub$03831956-9858-438e-936f-54ac40d77496h
)}(h
}(hrandom_numbers
```

The execution of the generated python code above has succeeded

The result of above Python code after execution is:
array([0.09918602, 0.68732778, 0.44413814, 0.4756623 , 0.48302334,h}(hj=  hj>  hKuhh	uh(hhh}h}(hj  hj  usub$382760e1-f83e-4e38-ad3c-66d526de692ah
)}(h
}(hI       0.8286594 , 0.80994359, 0.35677263, 0.45719317, 0.68240194])
`````h}(hj=  hj>  hK
uhh	uh(hhh}h}(hj  hj  usub$30ef6c3b-e254-482c-a707-ac4033ccfcc6h
)}(h
}(hTaskWeaver: A Code-First Agent Framework
Bo Qiao∗Liqun Li∗Xu Zhang∗Shilin He∗Yu Kang Chaoyun Zhang
Fangkai Yang Hang Dong Jue Zhang Lu Wang Minghua Ma Pu Zhao Si Qinh}(hdocs\taskweaver.pdf      h(TaskWeaver: A Code-First Agent FrameworkhK uhh	uh(hhh}h}(hj  hj  usub$c6d52736-655c-4002-9ab0-66e370f852a6h
)}(h
}(hX  Xiaoting Qin Chao Du Yong Xu Qingwei Lin Saravan Rajmohan
Dongmei Zhang
Microsoft
taskweaver@microsoft.com
Abstract
Large Language Models (LLMs) have shown impressive abilities in natural lan-
guage understanding and generation, leading to their use in applications such ash}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$a43cbf27-0635-4105-b508-5e41ae0bbca1h
)}(h
}(hXE  chatbots and virtual assistants. However, existing LLM frameworks face limita-
tions in handling domain-specific data analytics tasks with rich data structures.
Moreover, they struggle with flexibility to meet diverse user requirements. To ad-
dress these issues, TaskWeaver is proposed as a code-first framework for buildingh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$4d58b480-72a4-482c-b0fb-ac4ec5e97d43h
)}(h
}(hXE  LLM-powered autonomous agents. It converts user requests into executable code
and treats user-defined plugins as callable functions. TaskWeaver provides support
for rich data structures, flexible plugin usage, and dynamic plugin selection, and
leverages LLM coding capabilities for complex logic. It also incorporates domain-h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$7593c9a4-9b06-4503-9820-53232913cc4ah
)}(h
}(hXU  specific knowledge through examples and ensures the secure execution of generated
code. TaskWeaver offers a powerful and flexible framework for creating intelligent
conversational agents that can handle complex tasks and adapt to domain-specific
scenarios. The code is open-sourced at https://github.com/microsoft/TaskWeaver/.
1 Introductionh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$354b9c23-dac9-42d4-b349-b865b11a7498h
)}(h
}(hX"  Large Language Models (LLMs), such as GPT [ 18,12], Claude [ 11], Palm [ 10], and Llama [ 19],
have demonstrated remarkable capabilities in natural language understanding and generation. These
models have been widely used in various applications, including chatbots, virtual assistants, andh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$dbd0dd50-8600-449b-9c71-a851362834e7h
)}(h
}(hX  content-generation systems. There is a growing potential for LLMs to revolutionize the way humans
interact with machines, providing a more natural and intuitive experience.
An agent, specifically those that utilize Large Language Models (LLMs) or other AI technologies, ish}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$81378448-6fe6-4994-921f-46b13704e1ech
)}(h
}(hregarded as an autonomous entity that possesses the ability to plan tasks, observe its surroundings, and
execute appropriate actions accordingly [ 20,22]. Several existing frameworks, including Langchainh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$7797bd1e-4896-4feb-9e0e-0860e3f1ec82h
)}(h
}(hX  [6], Semantic Kernel [8], Transformers Agent [9], Agents [25], AutoGen [1], and JARVIS [5], have
endeavored to utilize LLMs for task-oriented conversations. These frameworks enable users to
interact with LLM-powered agents by issuing natural language requests and receiving responses inh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e3f6525c-5060-4c6d-bd2e-0ebf87a827dah
)}(h
}(hreturn. Nevertheless, these frameworks possess limitations that constrain their efficacy in handling
domain-specific scenarios and data analytics tasks.
∗Equal Contributionh}(hj  hj  hK	uhh	uh(hhh}h}(hj  hj  usub$3b2d9820-273e-484e-8662-64f7b7c006f7h
)}(h
}(hPreprint. Under review.arXiv:2311.17541v2  [cs.AI]  1 Dec 2023 One major limitation is that most existing frameworks lack native support to handle rich data
structures. LLM-powered agents often need to work with complex data structures, such as nestedh}(hj  hj  hK
uhh	uh(hhh}h}(hj  hj  usub$49b57148-794b-4518-a5bc-60ff0410a980h
)}(h
}(hX-  lists, dictionaries, or data frames, for data analytics applications and many other business scenarios.
However, many of existing frameworks struggle to handle these structures efficiently, particularly
when it comes to transferring information between chat rounds or across different plugins. In suchh}(hj  hj  hKuhh	uh(hhh}h}(hj   hj!  usub$6f8ed314-83af-46f3-916c-0b1b67370a69h
)}(h
}(hXo  cases, these frameworks either persist data to disk or encode complex structures as strings or JSON
objects in the prompts. Although these approaches are functional, they can lead to impracticality and
increased error rates, particularly when dealing with large datasets.
Another limitation of existing approaches is the lack of configuration for incorporating domainh}(hj  hj  hKuhh	uh(hhh}h}(hj)  hj*  usub$e6246f34-ac04-4c03-93c4-01116fde25cch
)}(h
}(hXa  knowledge. While these frameworks provide tools and examples for prompt engineering, they fail to
offer a systematic way to embed domain-specific knowledge into the planning and code-generation
process. Consequently, the limitation makes it challenging to control the planning and code generation
process in accordance with specific domain requirements.h}(hj  hj  hK
uhh	uh(hhh}h}(hj2  hj3  usub$0884417c-651b-481b-9637-03e31bdbdc13h
)}(h
}(hX+  Another issue encountered in many existing frameworks is their inflexibility, which hinders the ability
to meet the diverse requirements of users. Although plugins can address common needs, they may
fall short when it comes to handling ad-hoc queries. Writing a separate plugin for each ad-hoc queryh}(hj  hj  hKuhh	uh(hhh}h}(hj;  hj<  usub$1ba21226-71e1-4fa4-ab98-783846810af7h
)}(h
}(hX  is impractical. In these situations, it becomes necessary for the agent to be able to write custom code
to execute the user’s query. Therefore, there is a need for a solution that seamlessly integrates plugin
execution with custom code execution to address this issue.h}(hj  hj  hKuhh	uh(hhh}h}(hjD  hjE  usub$76a3c4e1-c2fc-4235-bdd2-6bcd407e541fh
)}(h
}(hX  To address these limitations, we propose TaskWeaver, a code-first framework for building LLM-
powered autonomous agents. The standout feature of TaskWeaver is its ability to convert each
user request into executable code, treating user-defined plugins as callable functions. TaskWeaverh}(hj  hj  hKuhh	uh(hhh}h}(hjM  hjN  usub$62773a95-37db-45ad-902c-e4aaf8bf3d30h
)}(h
}(hX!  overcomes the limitations of existing frameworks by providing support for rich data structures,
flexible plugin usage, and dynamic plugin selection. It leverages the coding capability of LLMs to
implement complex logic and incorporates domain-specific knowledge through examples. Addition-h}(hj  hj  hKuhh	uh(hhh}h}(hjV  hjW  usub$3830be65-8582-4772-bfc2-e031ef4d1750h
)}(h
}(hXU  ally, TaskWeaver has made considerable efforts towards the secure execution of generated code and
provides an easy-to-use interface for developers.
In this paper, we present the design and implementation of TaskWeaver, along with several case
studies that demonstrate its effectiveness in handling various tasks. Overall, TaskWeaver providesh}(hj  hj  hKuhh	uh(hhh}h}(hj_  hj`  usub$ca4dec1a-079c-402d-93cd-378f92817e1dh
)}(h
}(hX  a powerful and flexible framework for building intelligent conversational agents that can handle
complex tasks and adapt to domain-specific scenarios.
2 Motivation and Requirements
To illustrate the TaskWeaver approach, let’s consider a real-world use case – conducting anomalyh}(hj  hj  hKuhh	uh(hhh}h}(hjh  hji  usub$dc4dc13f-85b2-460b-9a0d-2583318dae90h
)}(h
}(hX"  detection on time series data stored in an SQL database. Our goal is to apply a specialized anomaly
detection algorithm to this data, requiring two input columns of types timestamp and float, respectively.
The expected conversation between the user and the AI assistant proceeds as follows:h}(hj  hj  hKuhh	uh(hhh}h}(hjq  hjr  usub$196e7ec4-0bf1-4b9e-b759-aa5752d25757h
)}(h
}(hThe example demonstrates common actions in a data analysis task, which include obtaining data,
performing analysis on the data, and visualizing the results. By examining the steps in this use case,
we can identify key requirements imposed by this example:h}(hj  hj  hKuhh	uh(hhh}h}(hjz  hj{  usub$d489bbdb-7339-4677-b259-673bda7190ebh
)}(h
}(hX  •Plugin: TaskWeaver must support invoking custom plugins. In this example, plugins are needed to
pull data from the database and implement the specialized anomaly detection algorithm.
•Rich data structure: TaskWeaver must be capable of handling data in complex structures, such ash}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$9c665972-0437-4b3f-859d-9a05ab62b15ah
)}(h
}(hXU  pandas DataFrame, to perform advanced data processing actions. Data in rich structure should be
able to transfer easily from one plugin to another.
•Stateful execution: TaskWeaver engages in iterative interactions with the user, processing user
inputs and executing tasks accordingly. The execution state is preserved throughout the entireh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$f75af4c0-aa4f-43a8-9fe6-a799bae6a2a4h
)}(h
}(hXQ  conversation session across multiple chat rounds.
•Reasoning and action (React): TaskWeaver is unaware of the data schema stored in the database
prior to reading it. To generate the anomaly detection code, TaskWeaver must first inspect the data
schema and then input the corresponding column names into the anomaly detection algorithm.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e02fbc39-840d-421b-8dee-5d8f231ab1fch
)}(h
}(h2 Figure 1: Chat between the user and the assistant.
•Response in natural language: TaskWeaver consistently responds to the user in human-readable
natural language. Generally, the anomaly detection algorithm returns a DataFrame, but TaskWeaverh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$0832ff6f-84e8-4550-8d18-b910d23ab52dh
)}(h
}(hX  needs to provide a summary of the execution result, such as “ 11 anomalies were detected ”.
•Code generation: TaskWeaver must generate code to accommodate ad-hoc user demands, which
are not covered by the pre-defined plugins. In the example provided, TaskWeaver generates code toh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$d7da44d6-8f10-42f7-b5a6-6d62ceae9db7h
)}(h
}(hX$  visualize the detected anomalies.
•Incorporating domain knowledge: TaskWeaver should provide a systematic way to incorporate
domain-specific knowledge. It would help LLMs make better planning and accurate tool calls,
which in turn produces reliable results, particularly in complex domains.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$2d75ebb2-a0f2-4991-b206-c07ae671601ch
)}(h
}(hX  •Persisting artifact: TaskWeaver should offer a means of saving results, such as DataFrames or
images, to persistent storage. Users can download the artifacts via the provided links.
We have so far summarized the requirements from our motivating example. In the following section,h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$4537dcc1-6dd1-413b-a96d-a08532eb4ffah
)}(h
}(hX  we are going to describe the design of TaskWeaver.
3 A Glance of TaskWeaver
Fig. 2 presents an overview of TaskWeaver, which consists of three key components: the Planner ,
Code Generator (CG), and Code Executor (CE). The Code Generator and Code Executor consisth}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$2e046ba9-8174-41b5-af1f-64d17e9db2d6h
)}(h
}(hX6  of the Code Interpreter (CI). The Planner serves as the system’s entry point and interacts with the
user. Its responsibilities include: (1) planning – breaking down the user’s request into subtasks and
managing the execution process with self-reflection; and (2) responding – transforming the executionh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$cafabcb0-a915-4df4-82be-3fcf7b085892h
)}(h
}(hXY  result into a human-readable response for the user. The CG generates code for a given subtask from
the Planner, considering existing plugins to enable the generated code to incorporate function calls for
specific tasks. The examples within the CG guide it, particularly for domain-specific tasks unfamiliar
3 Figure 2: The overview of TaskWeaverh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$15137051-88b1-48e9-8c03-f71f0c3776deh
)}(h
}(hto the LLM. Lastly, the CE is responsible for executing the generated code and maintaining the
execution state throughout the entire session.
Recall that we have motivated our design with an example of pulling data from a database andh}(hj  hj  hK uhh	uh(hhh}h}(hj  hj  usub$388afe7d-547c-4ae8-9d80-82fce88aa8aeh
)}(h
}(happlying a custom anomaly detection algorithm to the data. We now explain how a sub-task of it is
accomplished in TaskWeaver. The workflow is illustrated in Fig. 3. The prompts shown in Fig. 3 ish}(hj  hj  hK!uhh	uh(hhh}h}(hj  hj  usub$1dbd83f8-307c-4865-b7a6-5194ba132f30h
)}(h
}(hX<  simplified and only for illustration purpose due to the space limitation. The actual ones are much
more complicated.
The initial step involves the Planner taking the user query, Code Interpreter (CI) description, and,
if provided, planning examples to generate a plan. The CI description outlines its code generationh}(hj  hj  hK"uhh	uh(hhh}h}(hj  hj  usub$a440ca44-e84a-4a56-bfd0-2f9a40ccc6f0h
)}(h
}(hX1  and execution capabilities. To enhance the Planner’s effectiveness in task planning, the description
includes details of the available plugins. The output of the Planner is a step-by-step plan, according
to which the Planner phrases the queries and communicates with the CI. The first step (highlighted)h}(hj  hj  hK#uhh	uh(hhh}h}(hj  hj  usub$64534f0c-50a2-439a-97ce-7c4d07bf05a6h
)}(h
}(hX  consists of pulling data from the database and describing the data schema.
The Code Generator (CG) prompt delineates its profile and competencies, providing comprehensive
definitions of all the relevant plugins. This includes the function name, its description, the argumentsh}(hj  hj  hK$uhh	uh(hhh}h}(hj  hj  usub$6b1db948-2cdb-469b-be91-f3f1a4d56fceh
)}(h
}(hX5  it accepts, and what it returns. Additionally, code generation examples may be incorporated into the
prompt to steer the code generation process. The output from the CG is a code snippet that executes
thesql_pull_data plugin, retrieves the data into a DataFrame, and provides a description of the data
schema.h}(hj  hj  hK%uhh	uh(hhh}h}(hj
  hj  usub$291c077d-cb69-4450-b872-61abf7cce7bfh
)}(h
}(hX2  The Code Executor (CE)’s execution result is sent back to the Planner to determine the next step in the
plan. In practice, the Planner may modify its original plan if the outcome differs from expectations.
In our example, the execution result reveals two columns, namely date andvalue , in the DataFrame.h}(hj  hj  hK&uhh	uh(hhh}h}(hj  hj  usub$022e0d13-19d9-4f40-99db-ceeaf1dbaedbh
)}(h
}(hX'  For the next step, the Planner can either confirm with the user if these columns correspond to the
two input parameters ts_col andval_col of the anomaly_detection plugin, or directly proceed to
the third step, as it is reasonable to assume that date andvalue likely represent ts_col andval_col ,h}(hj  hj  hK'uhh	uh(hhh}h}(hj  hj  usub$7c55a543-6518-4333-966f-f0969827d73dh
)}(h
}(hX  respectively. Regardless, TaskWeaver must first retrieve the data and understand its schema before
making a decision for the second step, which involves a self-reflection process.
As shown in this example, TaskWeaver incorporates a two-layer planning process during the handlingh}(hj  hj  hK(uhh	uh(hhh}h}(hj%  hj&  usub$0a8f95e3-bbb2-431f-aa22-60745e74c317h
)}(h
}(hX/  of user requests. The first layer consists of the Planner generating a high-level plan outlining the steps
required to fulfill the request. Subsequently, in each round, the code generator must devise a plan, in
terms of chain-of-thought and generated code, to execute the specified step.
3.1 Terminologyh}(hj  hj  hK)uhh	uh(hhh}h}(hj.  hj/  usub$ff694562-e6c4-4dc4-b1dc-4477b2dc6f1dh
)}(h
}(hX/  We introduce the terminology in this paper.
Session A new session is established once the user submits their initial request to the TaskWeaver.
This session is terminated if the user intentionally resets the conversation or when a predetermined
expiration time is reached following the last interaction.h}(hj  hj  hK*uhh	uh(hhh}h}(hj7  hj8  usub$5e53552d-23a6-4b8b-980d-e070614d7b54h
)}(h
}(hX  4 You are the Planner who can coordinate CodeInterpreter to finish the user 
task. 
# The characters involved in the conversation
## User Character :
## CodeInterpreter Character :
## Planner Character :
# Let’s start the conversation!
<PlanningExamples>Planner
Name: sql_pull_datah}(hj  hj  hK+uhh	uh(hhh}h}(hj@  hjA  usub$65554449-9e85-4cb7-9f32-b171ca8be646h
)}(h
}(hX  Description : Pull data from a SQL database.
Parameters :
- name: query
  type: str
  description: This is the query in natural 
language that the user wants to get data 
from database.
Returns:
- name: df
  type: dataframe
  description: This DataFrame contains the 
query results.h}(hj  hj  hK,uhh	uh(hhh}h}(hjI  hjJ  usub$75255178-ffef-4126-bf05-a6457d9f12f0h
)}(h
}(hX0  - name: description
  type: str
  description: The description of the data.PluginsCode Interpreter Description
Name: anomaly_detection
Description : identifies anomalies from an 
input DataFrame.
Parameters :
- name: Df
  type: dataframe
  description: The input dataframe .
- name: ts_col
  ...
Returns:h}(hj  hj  hK-uhh	uh(hhh}h}(hjR  hjS  usub$63ff2036-545b-4cd2-aad3-948f78b5f427h
)}(h
}(h- name: df
  type: dataframe
  ...CodeInterpreter:
- description: generating and running Python 
code to complete the subtasks.
- plugins: descriptions of the plugins
Plan
1. pull data from db://table and describe the data 
schemah}(hj  hj  hK.uhh	uh(hhh}h}(hj[  hj\  usub$7f8aee98-156c-4648-a272-9a71f40a2819h
)}(h
}(h2. confirm if all required columns (ts_col, val_col) are 
presented in the data
3. perform anomaly detection on the data
4. report the results to user
CodeGenerator
Code
```python
query = “pull data from db://table ”
df = sql_pull_data(query)h}(hj  hj  hK/uhh	uh(hhh}h}(hjd  hje  usub$ff188dd2-5ebd-4a00-9820-67c9fedbe42ch
)}(h
}(hcolumns = “, ”.join([col for col in df.columns])
descriptions = f “Data has been pull from db://table and the schema of the 
data is: {columns} ”
```
CodeExecutor# CodeGenerator:
- CodeGenerator can understand the user request and generate h}(hj  hj  hK0uhh	uh(hhh}h}(hjm  hjn  usub$0b947a3a-710d-4abb-bb24-5669c809057ch
)}(h
}(hX,  syntactically correct python code to complete tasks.
- CodeGenerator can utilize pre -defined plugins in the form 
of python functions to achieve tasks.
# Plugins:
<Plugins>
# ConversationExamples:
<CodeGenerationExamples># CodeGenerator:
- CodeGenerator can understand the user request and generate h}(hj  hj  hK1uhh	uh(hhh}h}(hjv  hjw  usub$6b76450e-9f4f-42e2-90f6-484895bd7ff9h
)}(h
}(hsyntactically correct python code to complete tasks.
- CodeGenerator can utilize pre -defined plugins in the form 
of python functions to achieve tasks.
# Plugins:
<Plugins>
# ConversationExamples:
<CodeGenerationExamples>
Execution Results
state = 0h}(hj  hj  hK2uhh	uh(hhh}h}(hj  hj  usub$e8fe95af-5b00-409b-91d0-4bfd157192f7h
)}(h
}(hX  descriptions = “Data has been pull from db://table and the schema of the 
data is: date, valuePull data from db://table and apply anomaly detection on the data.Figure 3: Workflow of TaskWeaver
Round Around commences with TaskWeaver receiving a user request and concludes upon provid-h}(hj  hj  hK3uhh	uh(hhh}h}(hj  hj  usub$c9df2fa8-6665-43b6-8a3a-fdb7334bca90h
)}(h
}(hX%  ing a response to the user. Subsequently, TaskWeaver awaits the user’s next input. Typically, a session
consists of numerous rounds. Responses to the user may either involve a message after completing a
specific task or TaskWeaver requesting further input to successfully carry out the task.h}(hj  hj  hK4uhh	uh(hhh}h}(hj  hj  usub$1f051a7d-31c5-42d7-9580-dfbbf88bf6b6h
)}(h
}(hX*  Post Within a round, there can be several messages exchanged between the Planner, CG, and CE,
each of which is referred to as a post. A post contains the text message, such as the request from the
Planner to CG for code generation, and may also include other relevant information like the generatedh}(hj  hj  hK5uhh	uh(hhh}h}(hj  hj  usub$707ea15c-30dc-4ec6-8f65-b1d3535242ebh
)}(h
}(hXF  code and execution logs.
State TaskWeaver retains essential information throughout the conversation, known as the state (or
context) of the session. The state encompasses the chat history between the user and TaskWeaver, the
model’s internal thoughts and plans, generated code, execution results, and more. The session stateh}(hj  hj  hK6uhh	uh(hhh}h}(hj  hj  usub$4e8fda59-3eb0-49d6-a098-f6eb57fba71ch
)}(h
}(hX/  functions as a short-term memory for the LLM, guiding it towards making appropriate decisions.
5 4 Design Considerations
In this section, we discuss the design considerations.
4.1 Code-First Analysis Experience
Python has emerged as the de-facto language for data analysis, and TaskWeaver leverages thish}(hj  hj  hK7uhh	uh(hhh}h}(hj  hj  usub$c7002bc4-0c11-4f6f-a384-035c77e26442h
)}(h
}(hX-  popularity by converting user requests into Python programs that run on dedicated processes. Users
can issue natural language requests to manipulate the state of the Python process with TaskWeaver,
which utilizes popular libraries like numpy, pandas, sklearn, etc. Unlike other frameworks that rely onh}(hj  hj  hK8uhh	uh(hhh}h}(hj  hj  usub$30e6eeb3-1f7d-4c3e-894f-a2b0778f07d0h
)}(h
}(hXC  text or file-based expressions, TaskWeaver utilizes native data structures such as pandas DataFrame.
This makes it easy to perform tasks such as pulling data from a database, running machine learning
algorithms (e.g., anomaly detection, classification or clustering), summarizing results, and visualizing
analysis outcomes.h}(hj  hj  hK9uhh	uh(hhh}h}(hj  hj  usub$39ade554-1499-449f-8d01-eaab0f1db75dh
)}(h
}(hXO  4.2 Restricted Code Generation
The TaskWeaver approach involves a process of code verification (before code execution) to ensure
that the generated code conforms to pre-defined rules and does not violate any safety requirements.
This is achieved through the use of configurable rules that can be customized to suit different scenarios.h}(hj  hj  hK:uhh	uh(hhh}h}(hj  hj  usub$4136b882-b83e-408c-815c-e0ef167ad54fh
)}(h
}(hXT  By doing so, users can ensure that the code does not import any forbidden packages or invoke any
unsafe functions. This verification process not only addresses safety concerns, but also supports
plugin-only mode functionality where only the plugin functions can be called, and all other generated
code is denied.
4.3 Stateful Code Executionh}(hj  hj  hK;uhh	uh(hhh}h}(hj  hj  usub$b34f1675-819a-43de-a27a-46865392d1d1h
)}(h
}(hX#  When users make ad-hoc requests for data analysis, it often involves multiple iterations. As a result,
TaskWeaver needs to maintain the state of code execution throughout the entire session. This is
similar to programming in Python using Jupyter Notebook, where users type code snippets in ah}(hj  hj  hK<uhh	uh(hhh}h}(hj  hj  usub$780c07ff-129b-4146-b03d-92a85337b6b5h
)}(h
}(hX:  sequence of cells and the program’s internal state progresses sequentially. However, in TaskWeaver,
users use natural language instead of a programming language. TaskWeaver converts each user
request into one or more code snippets in each round, depending on the specific plan.
4.4 Intelligent Plan Decompositionh}(hj  hj  hK=uhh	uh(hhh}h}(hj  hj  usub$452d2d35-6b5a-494e-b8c4-251a4d5eabffh
)}(h
}(hX)  The Planner decomposes users’ requests into sub-tasks that are simplified and executable by the
Code Interpreter. We identified three types of dependencies between different sub-tasks: sequential,
interactive, and none. Sub-task Bis sequentially dependent on sub-task Ameans that Bmust run afterh}(hj  hj  hK>uhh	uh(hhh}h}(hj  hj  usub$99dc7838-84bb-4743-949c-10012d5bf36dh
)}(h
}(hAis done. Interactive dependency is the same as sequential in the execution order. More than that, it
means there is a need for intervention by a human or a LLM between the two sub-tasks. For example,h}(hj  hj  hK?uhh	uh(hhh}h}(hj  hj  usub$536618a1-f944-4901-8ef3-c6855a85e2f1h
)}(h
}(hthe user instructs TaskWeaver to read a file and follow the instructions in its content. The Planner
split the request into 2 steps: (1) read the file content; (2) follow the instructions. The LLM needsh}(hj  hj  hK@uhh	uh(hhh}h}(hj  hj  usub$6d1beea6-0ef8-48a5-b44e-abb1fafed9e5h
)}(h
}(hX,  to complete the first step to read the content before carrying out the second step. The third type of
dependency is none which means there is no dependency there and they can be conducted in parallel.
According to this definition, two steps where one sequentially depends on the other could be mergedh}(hj  hj  hKAuhh	uh(hhh}h}(hj	  hj	  usub$172e0ce3-7e3f-427b-b082-25f7e290ad3fh
)}(h
}(hX%  into one step for code generation because no intervention is required. In TaskWeaver, the Planner
involves two-phase planning, where we enforce it to reconsider the generated plan of sub-tasks to
merge sub-tasks with sequential dependency. This can prevent the model from splitting the requesth}(hj  hj  hKBuhh	uh(hhh}h}(hj	  hj	  usub$13c8071c-7b8e-4eea-b45c-81b4571860dah
)}(h
}(hinto too fine-grained steps leading to a prolonged execution process and incurring unnecessary costs
(call the LLM many times).
4.5 Self-Reflection
TaskWeaver is designed with the capacity to rectify errors throughout the planning and code generationh}(hj  hj  hKCuhh	uh(hhh}h}(hj	  hj	  usub$4c51b862-f29d-4402-a1f5-234a67ff99b6h
)}(h
}(hX3  stages. When it detects that the outcomes of the preceding steps diverge from the anticipated results,
the Planner possesses the capability to reassess and modify its plan, exploring alternative approaches.
Furthermore, the Code Interpreter (CI) is equipped to evaluate the results of code execution. Shouldh}(hj  hj  hKDuhh	uh(hhh}h}(hj!	  hj"	  usub$8c8512cb-9618-4c1d-96b2-9792b9fc2a9ch
)}(h
}(hX3  the code execution process encounter an exception or the code fails to pass the verification, the CI
6 can initiate a re-generation of the code, thereby attempting to correct the code. This iterative process
ensures that TaskWeaver maintains a robust and adaptable approach to task execution, increasing theh}(hj  hj  hKEuhh	uh(hhh}h}(hj*	  hj+	  usub$ec17ce2d-a3f5-4a4c-8dbe-5a81aea7da15h
)}(h
}(hX  reliability and efficiency of the framework.
4.6 Scalable Plugin Usage
In TaskWeaver, plugins are specialized Python functions used to handle tasks that are either too
complex or require specific domain knowledge, thereby eliminating the number of plugins neededh}(hj  hj  hKFuhh	uh(hhh}h}(hj3	  hj4	  usub$cf31a34b-827f-45fd-997e-336d62eaad4fh
)}(h
}(hX(  since TaskWeaver can already handle general Python code generation. Furthermore, TaskWeaver
features dynamic plugin selection. This means that after a user request is received, only the plugins
that are relevant to that request are selected from a pool of available plugins. This approach ensuresh}(hj  hj  hKGuhh	uh(hhh}h}(hj<	  hj=	  usub$56a0fd95-d4a8-4d12-8d5d-d4a01bb655abh
)}(h
}(hX  that TaskWeaver uses the most appropriate tools for the task at hand without overloading the prompt
with unnecessary functions.
4.7 Incorporating Domain Knowledge
One way to incorporate domain knowledge is by defining custom plugins, as discussed previously.h}(hj  hj  hKHuhh	uh(hhh}h}(hjE	  hjF	  usub$b127bda5-dd5c-4c48-993c-df718c6cf9abh
)}(h
}(hX)  However, for domain-specific tasks, it can be challenging for the LLM to generate the correct code
to call the plugins or to make a good plan. To address this, we have introduced an interface in
TaskWeaver that allows users to guide the system for such difficult tasks. TaskWeaver enables users toh}(hj  hj  hKIuhh	uh(hhh}h}(hjN	  hjO	  usub$27334802-33b3-4be1-8e3b-b25377448d94h
)}(h
}(hX%  configure examples to teach the LLM how to respond to certain requests. For instance, a conversation
history containing step-by-step thoughts and request/response sections can be used as an example.
There are two types of examples in TaskWeaver: one is used for planning and the other for codeh}(hj  hj  hKJuhh	uh(hhh}h}(hjW	  hjX	  usub$6afde013-01f6-4830-b640-4ef6a646b538h
)}(h
}(hX  generation. By using examples, TaskWeaver can incorporate domain-specific knowledge and improve
the LLM’s ability to generate accurate plans and code for difficult tasks.
4.8 Security
Executing freely generated code can introduce security risks. For example, a malicious user may askh}(hj  hj  hKKuhh	uh(hhh}h}(hj`	  hja	  usub$484eec5f-15cb-4343-9468-d446c96bb495h
)}(h
}(hX*  TaskWeaver to open a file containing security keys, delete system files, or terminate a process. A
typical way to mitigate the security risks is to include certain statements in the prompts to the LLM,
which is implemented in TaskWeaver. To prevent these malicious behaviors, TaskWeaver allocates ah}(hj  hj  hKLuhh	uh(hhh}h}(hji	  hjj	  usub$c958ccfd-0912-4986-a7ce-d67eb7ab4e67h
)}(h
}(hX(  separate worker process for each session, and each worker process is independent of other processes
under a special OS user. As a result, the worker process is unable to access files or other processes
beyond the OS user’s scope, thus preventing any malicious behavior. The worker processes areh}(hj  hj  hKMuhh	uh(hhh}h}(hjr	  hjs	  usub$7562ee21-c009-4a04-85f5-232942d9468eh
)}(h
}(hX  communicated via an inter-process interface. In the future, we plan to support running each worker
process in a sandbox to further enhance the security of the system.
4.9 Easy to Use
Existing LLM frameworks such as Langchain make it easy to build proof-of-concept demos. However,h}(hj  hj  hKNuhh	uh(hhh}h}(hj{	  hj|	  usub$ab34ba44-3cbc-46e3-87bb-a5c176483926h
)}(h
}(hX(  building a reliable system with these frameworks can be time-consuming due to the large number of
components they provide (like a large box of lego bricks). One of the main goals of TaskWeaver is
to make it easy to use. Typically, users only need to customize the plugins to get started. For moreh}(hj  hj  hKOuhh	uh(hhh}h}(hj	  hj	  usub$2571add6-f338-483b-81c3-b6d7f3c3b701h
)}(h
}(hX-  difficult tasks, users can customize the examples for code generation and planning. To aid users,
TaskWeaver includes various tools to help, such as a tool to convert Python functions into plugins and
a tool to save existing conversations as examples. These tools make it easier for users to customizeh}(hj  hj  hKPuhh	uh(hhh}h}(hj	  hj	  usub$23dab9d5-9cb5-4206-a43a-afc3b5e6f351h
)}(h
}(hX*  the system. Another key feature of TaskWeaver is its support for multi-tenant open box. The system
implements a session manager to isolate different user sessions, making it easy to serve multiple
users at once. This enables the system to be used as a service, supporting a wide range of use cases.h}(hj  hj  hKQuhh	uh(hhh}h}(hj	  hj	  usub$87d0fe34-fb91-4b45-bbc4-bd56eb692bfbh
)}(h
}(h4.10 LLM Cost
The cost of calling LLMs can be significant, and TaskWeaver addresses this issue by letting different
modules be configured with different LLM models. For instance, GPT 3.5 is much cheaper comparedh}(hj  hj  hKRuhh	uh(hhh}h}(hj	  hj	  usub$88381bcb-a3ea-488f-8126-b5a33fd623b9h
)}(h
}(hX  to GPT 4. Therefore, for simpler tasks, we may use a cheaper model to reduce the overall cost.
Second, we have carefully analyzed the process in TaskWeaver and incorporated a set of rules to
bypass calling the LLM under certain conditions.
7 5 TaskWeaver in Detailh}(hj  hj  hKSuhh	uh(hhh}h}(hj	  hj	  usub$1417cdc3-29fd-44a7-8727-ad49e8349f49h
)}(h
}(hX  We are going to explain each module in one section accordingly.
5.1 Planner
The Planner serves as a crucial component in the TaskWeaver system, where it decomposes requests
from users into several sub-tasks and orchestrates capabilities within TaskWeaver to complete the taskh}(hj  hj  hKTuhh	uh(hhh}h}(hj	  hj	  usub$25414e78-9d0c-4e48-a444-8688e892c685h
)}(h
}(hX  and report back to the users. From a high-level perspective, the Planner functions as the entry point
and controller, managing the entire system. As illustrated in Figure 2, the Planner communicates
bidirectionally with each component, sending queries and receiving responses.h}(hj  hj  hKUuhh	uh(hhh}h}(hj	  hj	  usub$71602b7e-7d1d-4cee-897c-ef2e090ea474h
)}(h
}(hAs shown in Fig. 3, a typical working routine of the Planner is as follows: The Planner first receives
a query from users and then decides to decompose it into multiple sub-tasks. These sub-tasksh}(hj  hj  hKVuhh	uh(hhh}h}(hj	  hj	  usub$b050f948-c8ee-4936-bb1a-afcf58b3a5b1h
)}(h
}(hX/  essentially form a “Initial Plan”, which is generated based on the knowledge of LLMs or enhanced
by domain-specific “Examples”. After drafting the initial plan, the Planner is requested to refine
this initial plan by considering the dependencies among the sub-tasks as discussed in Sec. 4.4 in ah}(hj  hj  hKWuhh	uh(hhh}h}(hj	  hj	  usub$23a856f0-1110-4121-a606-c3fc7f3704bfh
)}(h
}(hchain-of-thought manner. The Planner may merge sub-tasks into one in its “Final Plan”. Table 1
shows two examples of the Planner’s initial plan and final plan w.r.t the User’s request. In the firsth}(hj  hj  hKXuhh	uh(hhh}h}(hj	  hj	  usub$0df8b1ce-aeaf-4d63-a65a-2d6a6830b578h
)}(h
}(hX  example, the initial plan has 4 steps while the first 3 steps are merged into one in the final plan. In the
second example, the final plan is the same as the initial plan because the Planner needs to read the file
content and then understand what would be the next step.h}(hj  hj  hKYuhh	uh(hhh}h}(hj	  hj	  usub$53d4e830-d0c3-422f-b534-015bde6df192h
)}(h
}(hUser request: Load data from a.csv, show the column names, and count the number of rows
Initial Plan Final Plan
1. load a.csv
2.extract and display the column names <se-
quentially depends on 1>
3.count the number of rows <sequentially de-
pends on 1>h}(hj  hj  hKZuhh	uh(hhh}h}(hj	  hj	  usub$dfc215ab-ec85-47c0-a320-643f8d326641h
)}(h
}(hX  4.report the information to the user <interac-
tively depends on 2,3>1.load a.csv, extract the column names, and
count the number of rows
2. report the information to the user
User request: Read file manual.txt and follow the instructions in it
Initial Plan Final Planh}(hj  hj  hK[uhh	uh(hhh}h}(hj	  hj	  usub$17ed4a6a-b9f5-4a23-b6ae-e82e2cff5e15h
)}(h
}(hX  1. read manual.txt and show its content
2.follow the instructions according to the file
content <interactively depends on 1>
3.report the result to the user <interactively
depends on 2>1. read manual.txt and show its content
2.follow the instructions according to the file
contenth}(hj  hj  hK\uhh	uh(hhh}h}(hj	  hj	  usub$52b5e325-f83d-42f4-b91e-e709295cfc4ah
)}(h
}(h3. report the result to the user
Table 1: Two examples of the initial plans and the corresponding final plans.
After the plan is finalized, the Planner takes action by assigning each sub-task, with phrased queries,h}(hj  hj  hK]uhh	uh(hhh}h}(hj
  hj
  usub$623acdad-8107-42d4-8d80-317f7ed33b27h
)}(h
}(hX-  to ask the CI to generate a code snippet, leveraging in-domain plugins when necessary. The execution
results are sent back to the Planner. Following the ReAct (reasoning and act) design pattern, upon
observing the execution results, the Planner may update its plan, request additional information fromh}(hj  hj  hK^uhh	uh(hhh}h}(hj
  hj
  usub$2661725e-9b34-44af-93f0-b86e64196d0eh
)}(h
}(hX$  users, and so on. The process is repeated for subsequent sub-tasks until the entire plan is completed.
When addressing domain-specific scenarios where the LLM’s own knowledge is insufficient, the
Planner can incorporate external knowledge with “Examples”. These examples are tailored byh}(hj  hj  hK_uhh	uh(hhh}h}(hj
  hj
  usub$783c477a-ad19-4ede-b1d9-0dea1995b3b1h
)}(h
}(hX'  scenario developers according to their usage requirements. One example is essentially the chat history
between the User, the Planner, and the Code Interpreter, including the plans of the Planner. The
Examples could be saved from an online conversation for the purpose of stabilizing the planningh}(hj  hj  hK`uhh	uh(hhh}h}(hj
  hj
  usub$4c8903fd-1bec-4bff-b8cf-3c3c6dea70e7h
)}(h
}(h8 process for future requests, or they could be manually prepared. A real planning Example is provided
in Appendix B.
5.2 Code Generator (CG)
Code Generator is designed to utilize LLMs to automatically synthesize a Python code snippet basedh}(hj  hj  hKauhh	uh(hhh}h}(hj&
  hj'
  usub$ff97c2cd-32a5-4a0d-af90-0403b6d17190h
)}(h
}(hX   on an incoming request. From a high-level view, CG combines the benefits of both the plugin system
and code interpreter, allowing the capability of invoking plugins and generating additional code.
Inside the CG, plugins and examples customized by users are leveraged to generate the code.h}(hj  hj  hKbuhh	uh(hhh}h}(hj/
  hj0
  usub$c24d3f60-83fe-4e38-9189-f14a20aff083h
)}(h
}(hX  1.Plugin: In general, plugins can take on various forms such as a web API call, a software module,
a customized algorithm, or a deep learning model. Regardless of the form, all variants can be
invoked by a function call. Therefore, we have encapsulated the plugins as a Python functionh}(hj  hj  hKcuhh	uh(hhh}h}(hj8
  hj9
  usub$e62810b1-20d5-4b07-8e21-4f29d0e94fd8h
)}(h
}(hwithin the Code Generator for seamless invocation.
2.Example: To aid the LLMs in adapting to domain-specific scenarios, examples are designed to
help guide LLMs to behave by following the examples. Examples act as a guide for the LLMs toh}(hj  hj  hKduhh	uh(hhh}h}(hjA
  hjB
  usub$60936ed0-9e0a-4dd1-b278-9bf92caf9e04h
)}(h
}(hXW  follow and improve their performance by providing contextualized examples. This feature ensures
that the generated code aligns with the specific requirements of the user’s domain. A real code
generation Example is provided in Appendix C.
The CG has the ability to generate code that exclusively calls upon plugins, code that does not use anyh}(hj  hj  hKeuhh	uh(hhh}h}(hjJ
  hjK
  usub$5c60b4dd-79c4-4509-9764-b6bc61c60edch
)}(h
}(hX8  plugins, or a combination of both. The code that doesn’t depend on plugins is specifically tailored to
handle requests that cannot be met using only existing plugins. In the motivating example presented
in Section 2, there is a plugin called ‘anomaly_detection’ that can identify anomalies in data. In thish}(hj  hj  hKfuhh	uh(hhh}h}(hjS
  hjT
  usub$f4027908-c18d-4c06-871e-ea816ea6b5cfh
)}(h
}(hX4  scenario, the CG would first invoke the plugin through a single line of code. Subsequently, the CG
would generate multiple lines of additional code to visualize the detected anomalies, as requested by
the user.
The knowledge of LLMs is often limited, particularly in domain-specific scenarios. In many cases,h}(hj  hj  hKguhh	uh(hhh}h}(hj\
  hj]
  usub$f3e46a89-38b9-4f2e-a195-393dc0ae47f6h
)}(h
}(hX<  there are in-domain tools better suited for completing a specific task. Therefore, the Plugin system
can enhance LLMs by enabling them to leverage existing tools. The Plugin system consists of the
following two components:
1.Plugin Schema LLMs need to understand the capabilities of a plugin, including its argumentsh}(hj  hj  hKhuhh	uh(hhh}h}(hje
  hjf
  usub$baeca6d0-d101-48ea-ba9d-55a9468b7f5bh
)}(h
}(hX  and return values. This information is embedded in the plugin schema file (in YAML format)
and could be customized by users. The plugin schema comprises the name, metadata, plugin
description, arguments, and return values. For each argument and return value, users shouldh}(hj  hj  hKiuhh	uh(hhh}h}(hjn
  hjo
  usub$38bf8e69-4df5-4ee6-be46-422ff2d30259h
)}(h
}(hX5  provide its name, type, and description. The plugin schema is then supplied to LLMs as part of
the prompt to generate the code.
2.Plugin Implementation The plugin implementation defines how each plugin is executed using
Python in Code Executor, where the arguments and return values should correspond with theh}(hj  hj  hKjuhh	uh(hhh}h}(hjw
  hjx
  usub$c6932a0b-64cd-4cb2-a4fd-b094a0933405h
)}(h
}(hX  Plugin Schema. Various plugins can be implemented in Python code, such as a domain-specific
algorithm, a software module, a deep learning model or a Web API call. It is important to note that
LLMs do not need to know the plugin implementation details to generate code. On the contrary,h}(hj  hj  hKkuhh	uh(hhh}h}(hj
  hj
  usub$f6814559-c0a4-4fb5-924a-ae322a7b54d5h
)}(h
}(hthe Plugin Schema is used for code generation.
Appendix D shows an example of the plugin schema and implementation.
5.2.1 Examples
Adapting Large Language Models (LLMs) to domain-specific scenarios presents a significant chal-h}(hj  hj  hKluhh	uh(hhh}h}(hj
  hj
  usub$2e23d24e-538d-4df4-88c4-28e8af26a0f4h
)}(h
}(hX#  lenge in enabling LLMs to comprehend user queries and generate code for invoking plugins accurately
and reliably. Without proper guidance, LLMs cannot acquire domain-specific knowledge, and there-
fore, they may fail to chain the appropriate plugins to complete a task. The Example system ish}(hj  hj  hKmuhh	uh(hhh}h}(hj
  hj
  usub$4ac076ef-5aae-4f0e-9ce2-001eb57edd81h
)}(h
}(hX)  designed to provide a conversation-level demonstration for LLMs to emulate its behavior. It follows
a Chain-Of-Thoughts format before generating the code, which can be customized by users through a
YAML file. The examples are also included as part of the prompt sent to LLMs in the Code Generator,h}(hj  hj  hKnuhh	uh(hhh}h}(hj
  hj
  usub$b3595b1e-8318-4b8f-a25c-53cff359ee42h
)}(h
}(hX   alongside the plugin schema.
9 5.2.2 Post Verification
Even with explicit instructions added to the prompt, such as restricting the call to plugin functions,
prohibiting local file system modifications, or disallowing certain package installations, the LLMh}(hj  hj  hKouhh	uh(hhh}h}(hj
  hj
  usub$e9fc8a4d-099e-4f1a-be23-37b57a2b277ch
)}(h
}(hX4  may still generate code that disregards these instructions. This necessitates a post-verification process
to ensure the generated code is safe to execute. Following code generation, the Code Generator (CG)
parses the code into an Abstract Syntax Tree (AST) and examines it line by line for any violations. Ifh}(hj  hj  hKpuhh	uh(hhh}h}(hj
  hj
  usub$648d0617-39d4-4544-8e23-14755a87fceeh
)}(h
}(hX  violations are detected, an error message is reported, and a new code snippet is generated. The CG
will retry this process a maximum of three times; if violations persist, the CG will notify the planner
of its failure to generate compliant code.
5.2.3 Code Auto Correctionh}(hj  hj  hKquhh	uh(hhh}h}(hj
  hj
  usub$d495b898-9c9e-451e-9a14-f436537f4d14h
)}(h
}(hX,  Code auto-correction shares similarities with the post-verification process. If the generated code
fails to execute, the exception message is reported to the Code Generator (CG), prompting it to retry
generating the code. The primary difference is that post-verification operates within the CG, whileh}(hj  hj  hKruhh	uh(hhh}h}(hj
  hj
  usub$53edd2e8-0201-4e65-a167-c21f2dc367b9h
)}(h
}(hX&  code auto-correction relies on the Code Executor to run the code and report errors. Notably, if the
execution fails, the planner can directly redirect the message to the CG instead of synthesizing the
report via the LLM. TaskWeaver permits the code to regenerate up to a maximum of three times.h}(hj  hj  hKsuhh	uh(hhh}h}(hj
  hj
  usub$e903a6d2-493d-413e-87be-6bcf510c894dh
)}(h
}(hX7  5.3 Code Executor
The Code Executor (CE) receives the code generated by the Code Generator (CG), collects dependent
modules and plugin definitions, executes it, preserves context information such as logs, and returns
an execution result to the Planner. In TaskWeaver, we implement CE based on Python Jupyter. Toh}(hj  hj  hKtuhh	uh(hhh}h}(hj
  hj
  usub$fd91f46d-9346-4d38-a219-0faa1cdd2bb1h
)}(h
}(hX(  prevent interference between different sessions, CE maintains a separate Jupyter process for each
session. In future implementations, separate sandboxes (e.g., Docker containers) will be employed
for improved isolation across sessions. When a session concludes, the Jupyter process is terminated.h}(hj  hj  hKuuhh	uh(hhh}h}(hj
  hj
  usub$9272e721-95d4-43e4-96b0-715221ff5916h
)}(h
}(hX  The execution result contains the following parts which are returned to the planner.
•Return Code : a successful code execution will return code 0 and otherwise 1.
•Logs : Logs are generated in two ways: (1) the stdout/stderr output of the program, and (2) logh}(hj  hj  hKvuhh	uh(hhh}h}(hj
  hj
  usub$47afd21b-ff04-403b-af7f-d2966c8548e3h
)}(h
}(hX3  messages recorded using a logging utility within the plugins.
•Output : This is the output of the Jupyter cell running the generated code.
•Artifacts : The generated code or the plugin calls may produce artifacts such as a CSV file or an
image. The user can download the artifacts via the provided URLs.h}(hj  hj  hKwuhh	uh(hhh}h}(hj
  hj
  usub$cc4410e9-4682-4395-9f45-a2884f8ec527h
)}(h
}(hAs discussed in Sec. 5.2.3, if the CE fails to execute the code from the CG, it will report the error
logs to the CG, who would attempt to revise the code to fix the issue. All the error information andh}(hj  hj  hKxuhh	uh(hhh}h}(hj
  hj
  usub$1a2a335e-b7e0-4a28-80bb-e3e981bc2d6bh
)}(h
}(hX%  the failed code are kept in the CG’s conversation history so that the CG is aware of the full execution
history and can precisely understand the state of the CE.
6 Extension to Multi-Agents
This section briefly discusses the process of expanding the TaskWeaver to a multi-agent architecture.h}(hj  hj  hKyuhh	uh(hhh}h}(hj
  hj
  usub$26eeed97-0885-4b92-a1f0-de8f65cf76e4h
)}(h
}(hX3  Extending to a multi-agent architecture benefits from several aspects. Firstly, it is a good practice to
modularize a complex project into multiple agents, with each agent responsible for a specific set of
functions. This is especially useful if there are a large number of plugins. Secondly, this is a goodh}(hj  hj  hKzuhh	uh(hhh}h}(hj  hj  usub$1875e1c2-9c33-408f-a9b0-d2e1c3a53960h
)}(h
}(hway to add new functions, by plugging in a new agent, to an existing project without modifying
existing code.
There are two ways of extending the TaskWeaver in a multi-agent environment, as illustrated in Fig.h}(hj  hj  hK{uhh	uh(hhh}h}(hj  hj  usub$5cbcc15d-d032-44fd-af43-b59915049544h
)}(h
}(hX!  4. The first approach involves one agent (powered by TaskWeaver) calling other agents via its plugins.
Fig. 4 (a) depicts a simple example, although this can be extended to a more complex network
where multiple agents form a mesh network. The second approach involves embedding TaskWeaver-h}(hj  hj  hK|uhh	uh(hhh}h}(hj  hj  usub$d6cc95ec-f14f-40d7-b98e-ae2af13fe9a2h
)}(h
}(hpowered agents into an existing multi-agent framework, such as AutoGen [ 1], as demonstrated in Fig.
4 (b).
10 Figure 4: TaskWeaver in a multi-agent environment.
7 Case Studies
In this section, we demonstrate how to use TaskWeaver for practical tasks:h}(hj  hj  hK}uhh	uh(hhh}h}(hj"  hj#  usub$7074f881-53a0-462d-8888-7969be1b2e55h
)}(h
}(h• Anomaly detection based on data pulled from a database
• Stock price forecasting
7.1 Task 1: Anomaly Detection
We aim to identify anomalies within a time series dataset stored in an SQL database (sqlite3). Toh}(hj  hj  hK~uhh	uh(hhh}h}(hj+  hj,  usub$f79cb2f7-020a-4c19-b00c-6027b5b14104h
)}(h
}(hX*  accomplish this goal, we need to integrate the two plugins for data retrieval and time series anomaly
detection:
•pull_data_sql(nl_query) →sql, dataframe : Given a natural language query, pull data from a
database and return the result in a pandas DataFrame, along with the generated SQL query.h}(hj  hj  hKuhh	uh(hhh}h}(hj4  hj5  usub$f60bafc5-68cb-49a4-947f-45c8fd1801d3h
)}(h
}(hX
  •anomaly_detection →dataframe with anomalies, results description : This plugin utilizes a
straightforward 3-sigma algorithm to identify any abnormal data points in a time series. These data
points deviate from the mean value by more than three standard deviations.h}(hj  hj  hKuhh	uh(hhh}h}(hj=  hj>  usub$61ac8948-1fd7-4e52-b3ac-5ba36425dd02h
)}(h
}(hX  Note : There is no need to provide plugins for general tasks, such as reading or writing files, as the
TaskWeaver is capable of generating the necessary code for these tasks. This is an advantage of
TaskWeaver that reduces the development effort to bootstrap applications.h}(hj  hj  hKuhh	uh(hhh}h}(hjF  hjG  usub$0e9a1414-ebed-4fd2-ab50-9046c5706132h
)}(h
}(hX#  Fig.5 in Appendix A illustrated the detailed plan generated by the Planner of TaskWeaver:
1. Instruct CodeInterpreter to pull data from the time_series table in the database.
2. Confirm the columns to be detected anomalies.
3. Instruct CodeInterpreter to detect anomalies on the pulled data.h}(hj  hj  hKuhh	uh(hhh}h}(hjO  hjP  usub$624b7978-e626-458d-a887-3fa1c070fd4ch
)}(h
}(hX/  4. Report the detected anomalies to the user.
In the first step, the Code Interpreter invoked the pull_data_sql plugin function and automatically
filled the required parameters.
1from typing import Tuple
2import pandas as pd
3
4query = "SELECT * FROM time_series"
5df, description = sql_pull_data(query)h}(hj  hj  hKuhh	uh(hhh}h}(hjX  hjY  usub$5456377b-7719-4f5c-bd45-da0ffb6f7dech
)}(h
}(hX2  6df, description
After obtaining the time series data from the database, TaskWeaver asked the user to provide the
column names as additional information to proceed with anomaly detection. The user then provided
the "ts" and "val" column names to detect anomalies. With the additional information confirmed,h}(hj  hj  hKuhh	uh(hhh}h}(hja  hjb  usub$7f773910-5ffb-45e5-a546-295c4096781ah
)}(h
}(hX   TaskWeaver proceeded to execute the third step, i.e., detecting anomalies using the anomaly_detection
plugin.
11 1time_col_name = ’ts’
2value_col_name = ’val’
3
4anomaly_df, anomaly_description = anomaly_detection(df, time_col_name,
value_col_name)h}(hj  hj  hKuhh	uh(hhh}h}(hjj  hjk  usub$1c81d83e-6307-4399-9ed4-f0557ed58786h
)}(h
}(h5anomaly_df, anomaly_description
In the end, 11 anomalies were detected and a response was composed for the user.
7.2 Task 2: Stock Price Forecasting
The user requested TaskWeaver to forecast the Nasdaq 100 (QQQ) index prices for the next sevenh}(hj  hj  hKuhh	uh(hhh}h}(hjs  hjt  usub$638a0638-443d-4d26-bf5f-0918c15b3488h
)}(h
}(hX	  days with the ARIMA model. The detailed execution process is shown in Fig.6 in Appendix A. Here
is the plan generated by the TaskWeaver Planner:
1. Instruct CodeInterpreter to obtain historical QQQ stock price data.
2. Preprocess the data.
3. Train the ARIMA model.h}(hj  hj  hKuhh	uh(hhh}h}(hj|  hj}  usub$b979f08d-27b1-4fe3-bfdb-35373e4a89e4h
)}(h
}(hX-  4. Forecast the stock price for the next 7 days.
5. Report the predicted stock prices to the user.
After receiving the first subtask from the Planner, the Code Interpreter proceeded to generate a code
snippet for the first step. The Code Interpreter thought to itself, "ProgramApe (internal code name)h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$6f1e1506-e6d3-4686-8a73-1bcd3f504010h
)}(h
}(hX  will generate a code snippet to load QQQ historical data using pandas_datareader library." The
resulting code snippet is as follows:
1import pandas_datareader as pdr
2import datetime
3start_date = datetime.datetime.now() - datetime.timedelta(days=365)
4end_date = datetime.datetime.now()h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$df9dee15-fa1d-4294-87a4-e54ff0a08571h
)}(h
}(hX,  5qqq_data = pdr.get_data_yahoo(’QQQ’, start=start_date, end=end_date)
6qqq_data
Nonetheless, the execution encountered a failure and the following error information was reported:
The execution of the generated python code above has failed
During execution, the following messages were logged:
---h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$fea52d36-99e0-4ef8-adc2-2d8e48fbb2cah
)}(h
}(hTypeError Traceback (most recent call last)
Cell In[1], line 7
4 start_date = datetime.datetime.now() - datetime.timedelta(days=365)
5 end_date = datetime.datetime.now()h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$0d240902-7575-4232-9651-cf15f0ae3da4h
)}(h
}(h---> 7 qqq_data = pdr.get_data_yahoo(’QQQ’, start=start_date, end=end_date
)
8 qqq_data
File \path\to\python\Lib\site-packages\pandas_datareader\data.py:80, in
get_data_yahoo(*args, **kwargs)h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$fc5a178a-5d07-4992-a39c-7fbd3feef2f0h
)}(h
}(h79 def get_data_yahoo(*args, **kwargs):
---> 80 return YahooDailyReader(*args, **kwargs).read()
...
As a result, the Code Interpreter had to modify the code to resolve this issue.
1# Import yfinance or install it if it’s not available
2try:h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$8965553f-ea26-46a8-9549-cff0f82ebbfbh
)}(h
}(h3 import yfinance as yf
4except ImportError:
5 !pip install yfinance
6 import yfinance as yf
12 7
8# Load the historical price data for QQQ
9qqq_data = yf.download(’QQQ’)
10qqq_datah}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$680ef55e-87c7-4998-a991-28bfb29770bch
)}(h
}(hX)  In the aforementioned code, we can observe that TaskWeaver employed an alternative approach to
search for stock price data using the yfinance library. This time, the historical QQQ stock price data
was successfully downloaded and the results were forwarded to the Planner for further instructions.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$c1824fae-b7e1-40ca-808d-77b396d90d78h
)}(h
}(hX  The Planner received the downloaded stock price data and proceeded with the following steps of
preprocessing and ARIMA model training. Ultimately, TaskWeaver utilized the fitted ARIMA model
to predict the stock prices for the next 7 days.
8 Design Verificationh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$07fe385c-3c1b-4fec-b6aa-d295780fd19eh
)}(h
}(hIn this section, we use a set of tasks to verify that the TaskWeaver meets our requirements and design
considerations.
8.1 Planning
Task 1: React ReAct[ 23] merges Reasoning and Acting for LLMs, providing a comprehensiveh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$1045b637-490d-4f64-8f6c-5ffd9213d8dfh
)}(h
}(hX/  paradigm for generating verbal reasoning traces and actions to accomplish a given task. By leveraging
this approach, the system can conduct real-time reasoning to create, update, and adapt plans while
observing the external environments, to incorporate additional information into the reasoning process.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$f18a7b56-527c-4327-a86d-c98ff4d85ad1h
)}(h
}(hTo evaluate the ReAct capabilities of TaskWeaver, we adopted an approach inspired by the Challenge
Task presented in [ 3]. This involved generating a collection of Nfiles with the naming conventionh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$80ffe6af-fdaa-4b3a-9967-f179cf687c3eh
)}(h
}(hfile1.txt, file 2.txt, ..., file N.txt. Each file contained the instruction "read file i+ 1.txt in the same
directory to get the further information." The goal was to initiate the TaskWeaver to begin readingh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$7c7340ce-46a2-4117-a0ff-8a1f6cf39d00h
)}(h
}(hX8  file1.txt and sequentially progress to the subsequent files. The final file in the sequence contained
instructions to present a specific key to the user.
The results of the evaluation are illustrated as following:
Human: read the file_A.txt and follow the instructions in it
......
>>> [PLANNER->CODEINTERPRETER]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$d6601bd7-30cd-4cbe-8a90-deffdaab060dh
)}(h
}(hX  Please read the content of the file file_A.txt
......
>>> [CODEINTERPRETER->PLANNER]
The execution of the generated python code above has succeeded
The stdout is:
read file_B.txt in the same directory to get the further information.
......
>>> [PLANNER->CODEINTERPRETER]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$0cb4ca79-8523-4d64-951b-4d5a731d32f3h
)}(h
}(hX  Please read the content of the file file_B.txt
......
>>> [CODEINTERPRETER->PLANNER]
The execution of the generated python code above has succeeded
The stdout is:
read file_C.txt in the same directory to get the further information.
......
>>> [PLANNER->CODEINTERPRETER]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj
  usub$b0223718-4b4f-49fd-a3ee-c2b50a3e656eh
)}(h
}(hX  Please read the content of the file file_C.txt
13 >>> [CODEINTERPRETER->PLANNER]
The execution of the generated python code above has succeeded
The result of above Python code after execution is: Show the key to user.
The key is 12345.
......
>>> [PLANNER->USER]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$5fe56337-adaa-4247-80e4-de247af57182h
)}(h
}(hX@  The key is 12345.
In the depicted process, the Planner component of TaskWeaver initially broke down the task into two
steps and initiated the execution of the first step. This initial phase involved reading the contents
of the file_A.txt. Subsequently, the Code Interpreter received the Planner’s request and generatedh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$32420878-8270-47aa-8692-9dbb077be8bah
)}(h
}(hX+  a Python code snippet designed to extract the content from the specified file. The generated code
included an instruction to "read file_B.txt in the same directory to acquire further information."
Upon receiving this instruction, the Planner updated the plan, incorporating the directive to load theh}(hj  hj  hKuhh	uh(hhh}h}(hj'  hj(  usub$cce787f1-568b-4f1a-a257-f2ffc7d37cebh
)}(h
}(hXE  next file in the sequence. This iterative process continued until TaskWeaver successfully navigated
through the entire chain of files, culminating in the retrieval of the final message contained in the
last file. This concluding message held instructions to present the key to the user, signifying the
completion of the task.h}(hj  hj  hKuhh	uh(hhh}h}(hj0  hj1  usub$26f96e27-1fbb-4657-9171-0f8feba40f66h
)}(h
}(hX$  Task 2: Plan Decomposition The Planner is capable of breaking down a task into multiple subtasks
and delegating them to the Code Interpreter for execution one by one. However, given the cost and
efficiency of LLM, it is not practical to invoke LLM multiple times, particularly for some simpleh}(hj  hj  hKuhh	uh(hhh}h}(hj9  hj:  usub$59de17a1-059b-4664-b0dd-2c5ce70f353ch
)}(h
}(hX-  tasks. For this reason, we devised a two-phase planning approach that mandates a review of the
generated initial plan, for merging sub-tasks with sequential dependencies (refer to Sec.4.4 for more
details).
As an illustration, we tasked TaskWeaver with computing the mean and standard deviation of theh}(hj  hj  hKuhh	uh(hhh}h}(hjB  hjC  usub$b4f26fc4-a322-49ae-9a14-25bea10730e7h
)}(h
}(h’Count’ column in the file "/sample_data/demo_data.csv" and subsequently verifying the presence of
any values exceeding 3 standard deviations from the mean. The initial plan generated by the Planner
is as follows:
• 1. Load the data fileh}(hj  hj  hKuhh	uh(hhh}h}(hjK  hjL  usub$f9fd4c31-55f5-430b-a334-a4f3956f2fcfh
)}(h
}(h• 2. Calculate the mean and std of the ’Count’ column <sequentially depends on 1>
• 3. Check if there are any values larger than 3 std from the mean <sequentially depends on 2>h}(hj  hj  hKuhh	uh(hhh}h}(hjT  hjU  usub$1f231fcb-6065-4870-a343-4db4f63bee75h
)}(h
}(hX  • 4. Report the result to the user <interactively depends on 3>
It is noteworthy that the first and second steps can be combined into a single step since they can be
accomplished in a single snippet of code without any interventions. As a result, the Planner mergedh}(hj  hj  hKuhh	uh(hhh}h}(hj]  hj^  usub$df5eb47e-5c1d-4f3c-9eba-4ab20f9c9519h
)}(h
}(hthem into one: “Instruct CodeInterpreter to load the data file and calculate the mean and std of the
’Count’ column.” By doing so, we were able to reduce the number of LLM calls from 6 to 3 for the
data loading and the mean/std value calculation.h}(hj  hj  hKuhh	uh(hhh}h}(hjf  hjg  usub$994ab25c-f616-471d-8e12-1fa4ce80bf76h
)}(h
}(h8.2 Coding and Execution
Task 1: Plugin-Only Mode In the TaskWeaver, we offer a plugin-only mode, which exclusively
permits the invocation of plugin functions and forbids the generation of arbitrary code. To validateh}(hj  hj  hKuhh	uh(hhh}h}(hjo  hjp  usub$795f5c33-a106-426c-a8a2-f8217906f70eh
)      }(h
}(hX"  this feature, we activate the plugin-only mode in the configuration file and instruct TaskWeaver to
generate 10 random numbers. Since the "rand" and "random" functions are not implemented as
plugin functions, the Code Interpreter refuses to generate the code due to the constraints. Even ifh}(hj  hj  hKuhh	uh(hhh}h}(hjx  hjy  usub$48e70f34-d82e-4378-97dd-729979e6b062h
)}(h
}(hXU  the code were generated, the code verification module would still intercept the execution due to the
prohibition of non-plugin functions.
Human: generate 10 random numbers
......
>>> [THOUGHT]
Code Interpreter understands the user’s request and the constraints.
14 However, generating random numbers requires the use of numpy or the randomh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$c2e54a2d-f311-483d-8900-f8a032fdf05eh
)}(h
}(hX  module, which is not allowed in this case.
>>> [CODEINTERPRETER->PLANNER]
I’m sorry, but I cannot generate random numbers without using numpy or the
random
module, which are not allowed in this task. Please let me know if there’s
anything
else I can help you with.
......h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$9bf5f8d4-8203-4d76-a9a8-ec1e7b89fd47h
)}(h
}(hX"  Task 2: Stateful Execution The Code Interpreter is stateful, meaning it maintains execution states
and variables within the same session. For example, we initially asked TaskWeaver to display the
column names of ./sample_data/demo_data.csv. Subsequently, we instructed TaskWeaver to executeh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$357cad0d-9144-4c82-9d00-eb1df3900e99h
)}(h
}(hX  an irrelevant task, such as “generate 10 random numbers.” Afterward, we requested the mean value
of the “Count” column in the previously loaded data. The TaskWeaver identified the need to use data
from the previous chat round and subsequently delivered an appropriate response.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$465c7927-4250-426e-9373-e451a4483fe3h
)}(h
}(hX$  Task 3: Auto Correction The Code Interpreter can make mistakes while generating Python code,
resulting in execution failures for the executor. To address this, we prompt the Code Interpreter to
revise its code based on the reported error information. We asked TaskWeaver to calculate the meanh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$b6a076d7-ea62-476a-afd6-19dacec7a71bh
)}(h
}(hX  value of sample_data/demo_data.csv. The Code Interpreter initially generated incorrect code due to
the absence of the data schema.
1import pandas as pd
2
3data_file_path = ’../../../sample_data/demo_data.csv’
4df = pd.read_csv(data_file_path)
5mean_value = df.mean()
6mean_valueh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$d74af2ba-99a3-4d4d-8ffa-8ed2bdde28e8h
)}(h
}(hX  Upon revising the code, it automatically identifies the columns with numerical data types and
calculates their average value successfully.
1import pandas as pd
2
3# Load the data file
4data_file_path = ’../../../sample_data/demo_data.csv’
5df = pd.read_csv(data_file_path)
6h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$2a1e1a9a-3ca6-40c6-9226-7b38e3a277d8h
)}(h
}(hX3  7# Calculate the mean value of the loaded data
8mean_value = df.mean(numeric_only=True)
9mean_value
8.3 Safety
Task 1: Preventing File Deletion and Secret Key Leakage To ensure the safety and security of
the execution environment, we have developed a restricted list to prevent certain sensitive operations,h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$61ab9eaa-9eb1-4605-933d-c40750e84f14h
)}(h
}(hX]  which can be customized by developers. When requesting the TaskWeaver to delete a file in the
system folder or retrieve the secret key from global environment variables, it declines to execute these
tasks, as they are deemed high-risk operations.
9 Related Work
LLM and Prompt Engineering Recent advancements in natural language processing have beenh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$40c79b8a-a6bf-4e7b-a0fb-b8413df02814h
)}(h
}(hdriven by large language models (LLMs) [ 24] such as GPT [ 12], GPT-4 [ 17], Palm [ 10], and
Llama [ 19]. These models have not only revolutionized the field of natural language processing, buth}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$b7b7ecfe-1aa4-4e5d-858f-db266d2046a4h
)}(h
}(halso how humans interact with machines through applications such as ChatGPT. LLMs are pre-trained
on a vast amount of text data and then fine-tuned with reinforcement learning from human feedbackh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$23e1cd53-543a-4c7c-bc06-8d12c72c324ah
)}(h
}(h15 (RLHF) and Instruction Fine-Tuning (IFT) [ 12] to improve their response quality. To improve the per-
formance of LLMs on reasoning and decision-making tasks, various prompting engineering methodsh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$b2e98fa4-cbdf-4285-a10c-1765164ef442h
)}(h
}(hhave been proposed, including Chain-of-Thought (CoT) [ 21], zero-shot-CoT [ 14], and ReAct [ 23].
Some of these approaches have also been applied in TaskWeaver to enhance its performance.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$74daa0f9-fd94-49cb-9e7f-b3d7c052486bh
)}(h
}(hX"  Agent Recently, LLM-based agents have gained increasing attention. The fundamental concept is
to utilize LLMs as the core controller to make human-like decisions by observing the environment,
planning, and taking actions[ 7]. There are generally two types of agent systems: single-agent andh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$d27c23c6-660b-477f-b045-f39819eafabch
)}(h
}(hX,  multi-agent systems. The single-agent system focuses more on planning, observing, and acting within
the single agent’s own capability, including AutoGPT[ 2] and LangChain Agents[ 6]. Conversely, the
latter is more concentrated on leveraging multiple agents to work collaboratively. Typical examplesh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj 
  usub$5fd473df-e7f8-4b3e-baaa-b06540f4777dh
)}(h
}(hinclude BabyAGI[ 4], MetaGPT[ 13], AutoGen[ 1], CAMEL[ 15], and Multi-agent Debate (MAD)[ 16].
Our TaskWeaver is a single-agent framework that focuses on converting user requests into code, even
for plugin calls.
10 Conclusionh}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj	
  usub$bc0ad453-ec1b-454a-ba5d-8b0eba582b5dh
)}(h
}(hX-  In this paper, we introduced TaskWeaver, a code-first framework for building LLM-powered au-
tonomous agents that addresses the limitations of existing frameworks in handling rich data structures,
incorporating domain knowledge, and offering flexibility. TaskWeaver’s standout feature is its abilityh}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$3fd91949-3dbc-477a-9e5a-7018d9d8d6a2h
)}(h
}(hX(  to convert user requests into executable code while treating user-defined plugins as callable functions.
This approach enables the seamless integration of plugin execution with custom code execution,
catering to the diverse requirements of users and providing a more intuitive user experience. Weh}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$81fb1dba-47c1-4e20-b1e8-94d56776c8edh
)}(h
}(hX$  presented the design and implementation of TaskWeaver, highlighting its support for complex data
structures, flexible plugin usage, and intelligence task planning. We also demonstrated TaskWeaver’s
ability to leverage the coding capability of LLMs to implement complex logic and incorporateh}(hj  hj  hKuhh	uh(hhh}h}(hj#
  hj$
  usub$927b3015-d04e-4b98-b790-ec7eb8fb9a5ah
)}(h
}(hX$  domain-specific knowledge through examples. Furthermore, we discussed the efforts made towards
the secure execution of generated code and the provision of an easy-to-use interface for developers.
Through various case studies, we showcased the effectiveness of TaskWeaver in handling differenth}(hj  hj  hKuhh	uh(hhh}h}(hj,
  hj-
  usub$9fadd28c-3755-413b-9cb2-fb29d551d23dh
)}(h
}(htasks. Overall, TaskWeaver offers a powerful and flexible solution for building intelligent conversa-
tional agents. As LLMs continue to evolve and improve, TaskWeaver can facilitate more advanced
and sophisticated applications.
Referencesh}(hj  hj  hKuhh	uh(hhh}h}(hj5
  hj6
  usub$40c1216c-bb67-47a4-bd73-f5f71e90b9b2h
)}(h
}(h[1]Autogen. Available at: https://github.com/microsoft/autogen . Accessed on
[11/22/2023].
[2]Autogpt. Available at: https://github.com/Significant-Gravitas/AutoGPT . Ac-h}(hj  hj  hKuhh	uh(hhh}h}(hj>
  hj?
  usub$a64634be-2612-4ac4-b9cf-5ef9e78c8604h
)}(h
}(hcessed on [11/22/2023].
[3]Autogpt challenge. Available at: https://github.com/Significant-Gravitas/
AutoGPT/blob/master/docs/content/challenges/memory/challenge_a.md . Ac-
cessed on [11/22/2023].h}(hj  hj  hKuhh	uh(hhh}h}(hjG
  hjH
  usub$f0b24760-b897-4ec5-98a5-702a5317b528h
)}(h
}(h[4]Babyagi. Available at: https://github.com/yoheinakajima/babyagi . Accessed on
[11/22/2023].
[5]Jarvis. Available at: https://github.com/microsoft/JARVIS . Accessed on [11/22/2023].h}(hj  hj  hKuhh	uh(hhh}h}(hjP
  hjQ
  usub$f4a7a83a-c8dc-4763-937b-69fdbe839eadh
)}(h
}(h[6] Langchain. Available at: https://www.langchain.com/ . Accessed on [11/22/2023].
[7]Llm powered autonomous agents. Available at: https://lilianweng.github.io/posts/h}(hj  hj  hKuhh	uh(hhh}h}(hjY
  hjZ
  usub$12d721bf-6105-4844-a8a1-e90e9dbc76e2h
)}(h
}(h2023-06-23-agent/ . Accessed on [11/22/2023].
[8]Semantic kernel. Available at: https://github.com/microsoft/semantic-kernel . Ac-
cessed on [11/22/2023].h}(hj  hj  hKuhh	uh(hhh}h}(hjb
  hjc
  usub$aae98759-bb83-43a7-859f-7914a7d0b419h
)}(h
}(h[9]Transformers agents. Available at: https://huggingface.co/docs/transformers/
transformers_agents . Accessed on [11/22/2023].h}(hj  hj  hKuhh	uh(hhh}h}(hjk
  hjl
  usub$c05783fe-8bdd-4f63-a557-89dfb103a8e8h
)}(h
}(h16 [10] Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos,
Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al. Palm 2 technical report.h}(hj  hj  hKuhh	uh(hhh}h}(hjt
  hju
  usub$bc4e16d5-9fc2-454d-b297-5a9f4f7309aah
)}(h
}(harXiv preprint arXiv:2305.10403 , 2023.
[11] Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andyh}(hj  hj  hKuhh	uh(hhh}h}(hj}
  hj~
  usub$17a587a2-237e-40c8-ac37-4d85659248f5h
)}(h
}(hJones, Nicholas Joseph, Benjamin Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds,
Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom B.h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$91f74701-29b4-4d81-9f04-9449160b8734h
)}(h
}(hBrown, Jack Clark, Sam McCandlish, Chris Olah, and Jared Kaplan. A general language
assistant as a laboratory for alignment. CoRR , abs/2112.00861, 2021.h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$3b02194f-9eda-4f90-9490-041836b5def5h
)}(h
}(h[12] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal,
Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models areh}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$b0e1e26e-7b9d-4202-99b6-72d068cf0e08h
)}(h
}(hfew-shot learners. Advances in neural information processing systems , 33:1877–1901, 2020.
[13] Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Ceyao Zhang, Zili Wang, Stevenh}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$7a9d6efd-01cf-4bdf-af46-2250c6d6811ch
)}(h
}(hKa Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, et al. Metagpt: Meta programming for
multi-agent collaborative framework. arXiv preprint arXiv:2308.00352 , 2023.h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$cadc440b-f29c-4950-9048-a01f51d4560dh
)}(h
}(h[14] Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large
language models are zero-shot reasoners. Advances in neural information processing systems ,
35:22199–22213, 2022.h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$390c4d82-aa30-4750-b632-4627769c7e24h
)}(h
}(h[15] Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard
Ghanem. Camel: Communicative agents for" mind" exploration of large scale language modelh}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$c4dd4b0f-f6d1-46ca-a4d9-2443b071da76h
)}(h
}(hsociety. arXiv preprint arXiv:2303.17760 , 2023.
[16] Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang,h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$e49df6d4-15c5-4dd7-8777-606abb3a5469h
)}(h
}(hZhaopeng Tu, and Shuming Shi. Encouraging divergent thinking in large language models
through multi-agent debate. arXiv preprint arXiv:2305.19118 , 2023.h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$f544a5e1-d9b2-4164-92ca-ae8f88211697h
)}(h
}(h[17] OpenAI. Gpt-4 technical report. ArXiv , abs/2303.08774, 2023.
[18] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving languageh}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$e5ac95f2-326c-4079-9fda-6ea9259d6af2h
)}(h
}(hunderstanding with unsupervised learning. OpenAI Blog , 2018.
[19] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo-h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$0d078f97-a52f-40a0-8235-ade056ef118eh
)}(h
}(hthée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open
and efficient foundation language models. arXiv preprint arXiv:2302.13971 , 2023.h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$9187f0e1-ea50-4f00-a1b8-52a959ec5d6ch
)}(h
}(h[20] Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen,
Jiakai Tang, Xu Chen, Yankai Lin, et al. A survey on large language model based autonomoush}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$b2e2dcaa-b660-4280-8f7e-843556d989beh
)}(h
}(hagents. arXiv preprint arXiv:2308.11432 , 2023.
[21] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le,h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj
  usub$e5986472-26dc-4e53-ab06-c7be229c74a4h
)}(h
}(hDenny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.
Advances in Neural Information Processing Systems , 35:24824–24837, 2022.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$358ef706-9f8f-40db-be75-7e7aa487ad5dh
)}(h
}(h[22] Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang,
Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language modelh}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj  usub$378587ef-ab1d-4c85-893d-dc88c30e86d8h
)}(h
}(hbased agents: A survey. arXiv preprint arXiv:2309.07864 , 2023.
[23] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$3b7ad824-3272-450d-ada8-24e60f154aefh
)}(h
}(hReact: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 ,
2022.
[24] Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min,h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj   usub$c3333ce0-f4cd-4295-a660-927ef8204d3bh
)}(h
}(hzBeichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv
preprint arXiv:2303.18223 , 2023.h}(hj  hj  hKuhh	uh(hhh}h}(hj(  hj)  usub$7e922143-d0b7-482c-ae5b-d59f8226e2adh
)}(h
}(h[25] Wangchunshu Zhou, Yuchen Eleanor Jiang, Long Li, Jialong Wu, Tiannan Wang, Shi Qiu,
Jintian Zhang, Jing Chen, Ruipu Wu, Shuai Wang, et al. Agents: An open-source frameworkh}(hj  hj  hKuhh	uh(hhh}h}(hj1  hj2  usub$1cde76f8-222c-4d3f-9810-cff3f701f6f7h
)}(h
}(hfor autonomous language agents. arXiv preprint arXiv:2309.07870 , 2023.
17 A Case Study Results
Figure 5: Case Study - Task 1: Anomaly Detection
Role Type Content
User To Planner detect anomaly on time_series table from databaseh}(hj  hj  hKuhh	uh(hhh}h}(hj:  hj;  usub$017046f6-3547-497c-8ef7-7b239305696ch
)}(h
}(hPlan1. instruct CodeInterpreter to pull data from the time_series table in the database
2. confirm the columns to be detected anomalies
3. instruct CodeInterpreter to detect anomalies on the pulled data
4. report the detected anomalies to the userh}(hj  hj  hKuhh	uh(hhh}h}(hjC  hjD  usub$41a93fe6-2ba8-4e29-bd55-605a12b652e4h
)}(h
}(hX>  Current_Plan_Step 1. instruct CodeInterpreter to pull data from the time_series table in the database
To CodeInterpreter Please pull data from the time_series table in the database
ThoughtProgramApe will generate a code snippet to pull data from the time_series table in the 
database using the sql_pull_data function.h}(hj  hj  hKuhh	uh(hhh}h}(hjL  hjM  usub$f7e9d762-cd53-4dc4-9ba7-d6197fe0ed3eh
)}(h
}(hX  Pythonquery = "SELECT * FROM time_series"
df, description = sql_pull_data (query)
df, description
Execution_Status SUCCESS
To Planner <Information above>
Plan <Plan does not change, not displayed>
Current_Plan_Step 2. confirm the columns to be detected anomaliesh}(hj  hj  hKuhh	uh(hhh}h}(hjU  hjV  usub$22a5ccc7-7e82-4625-a4b2-94b61c7b432fh
)}(h
}(hX  To UserI have pulled the data from the time_series table in the database. The table has the 
following columns: 'ts' and 'val'. Which column(s) do you want to detect anomalies on?
User To Planner please use ts and val columns
Plan <Plan does not change, not displayed>h}(hj  hj  hKuhh	uh(hhh}h}(hj^  hj_  usub$8854eb01-b427-4562-a4b4-0b1321214015h
)}(h
}(hX  Current_Plan_Step 3. instruct CodeInterpreter to detect anomalies on the pulled data
To CodeInterpreter Please detect anomalies on the 'ts' and 'val' columns of the pulled data
ThoughtProgramApe will generate a code snippet to detect anomalies on the 'ts' and 'val' columns h}(hj  hj  hKuhh	uh(hhh}h}(hjg  hjh  usub$3a41ac37-62f8-4337-a5ba-8a4e2a2ad1ddh
)}(h
}(hX   of the pulled data using the anomaly_detection function.
Pythontime_col_name = 'ts'
value_col_name = 'val'
df_anomalies, anomaly_description = anomaly_detection (df, 
time_col_name, value_col_name)
df_anomalies, anomaly_description
Execution_Status SUCCESSh}(hj  hj  hKuhh	uh(hhh}h}(hjp  hjq  usub$26eeac14-bf6e-4d64-bb38-01ec96ad8bf3h
)}(h
}(hX  Execution_ResultThe execution of the generated python code above has succeeded
The result of above Python code after execution is: 
(                 ts                                         val             Is_Anomaly
 0   2023-02-01 00:00:00+00:00   67814.0       Falseh}(hj  hj  hKuhh	uh(hhh}h}(hjy  hjz  usub$faa4ced6-e9d8-4093-823c-a67d2f86b3fbh
)}(h
}(h 1   2023-02-01 04:00:00+00:00   84569.0       False
 ..                        ...       ...         ...
 718 2023-05-31 16:00:00+00:00   89912.0       Falseh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$a51e9501-34d6-43d0-a21b-4b4764955ab1h
)}(h
}(h 719 2023-05-31 20:00:00+00:00   87249.0       False
 [720 rows x 3 columns],
 'There are 11 anomalies in the time series data')
To Planner <Information above>
Plan <Plan does not change, not displayed>h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$0db3070d-61f4-4851-bc86-ba8b8b987a94h
)}(h
}(hX   Current_Plan_Step 4. report the detected anomalies to the user
To UserI have detected anomalies on the 'ts' and 'val' columns of the pulled data. There are 11 
anomalies in the time series data.The execution of the generated python code above has succeededh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$11415b0d-4f7d-4528-be44-106829fa87b1h
)}(h
}(hThe result of above Python code after execution is:
(                       ts                              val   
 0    2023-02-01T00:00:00Z   67814.0
 1    2023-02-01T04:00:00Z   84569.0h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$8ae1938f-dbf9-4409-b6c8-57c2bb4fa332h
)}(h
}(h ..                    ...       ...
 718  2023-05-31T16:00:00Z   89912.0
 719  2023-05-31T20:00:00Z   87249.0
 [720 rows x 2 columns],h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$058abe8a-f270-4c4e-b5eb-096dc50c2a40h
)}(h
}(h 'I have generated a SQL query based on `SELECT * FROM time_series`.
The SQL query is SELECT * FROM time_series;.
There are 720 rows in the result.
CodeInterpreterPlanner
PlannerPlanner
Execution_ResultCodeInterpreter
Plannerh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$996d0c3d-6dfc-4cd5-a77c-09f88a99c04eh
)}(h
}(h18 Figure 6: Case Study - Task 2: Stock Price Forecasting
Role Type Content
User To Planner use ARIMA model to forecast QQQ in next 7 days
Plan1. instruct CodeInterpreter to load QQQ historical data
2. preprocess the data
3. train the ARIMA modelh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$f231cd40-7ec6-4446-90b9-3a164946ce97h
)}(h
}(hX  4. forecast QQQ for the next 7 days
5. report the forecasted values to the user
Current_Plan_Step 1. instruct CodeInterpreter to load QQQ historical data
To CodeInterpreter Please load QQQ historical data
ThoughtProgramApe will generate a code snippet to load QQQ historical data using h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$a913d70c-221e-4235-a666-0ee03f85f6e3h
)}(h
}(hX
  pandas_datareader library.
Pythonimport pandas_datareader as pdr
import datetime
start_date = datetime.datetime.now () - datetime.timedelta (days=365)
end_date = datetime.datetime.now ()
qqq_data = pdr.get_data_yahoo ('QQQ', start=start_date, end=end_date )
qqq_datah}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$13ffb7bf-4b81-4e38-8234-1bf61fc110f7h
)}(h
}(hXN  Execution_Status FAILURE
Execution_ResultThe execution of the generated python code above has failed
During execution, the following messages were logged:
---
TypeError                                 Traceback (most recent call last)
Cell In[1], line 7
           4 start_date = datetime.datetime.now() - datetime.timedelta(days=365)h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$cefe48c7-cf83-4dcf-acaa-6577a303622dh
)}(h
}(h           5 end_date = datetime.datetime.now()
  ---> 7 qqq_data = pdr.get_data_yahoo('QQQ', start=start_date, end=end_date)
           8 qqq_datah}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e9ce9091-0d4a-4c2d-8b3e-326520d39ea4h
)}(h
}(hFile \path\to\python\Lib\site-packages\pandas_datareader\data.py:80, in 
get_data_yahoo(*args, **kwargs)
         79 def get_data_yahoo(*args, **kwargs):
 --->  80     return YahooDailyReader(*args, **kwargs).read()h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$2d7d1bd5-f2db-4b62-956d-2c13c8c6d0e7h
)}(h
}(hX            ...
To CodeInterpreterThe execution of the previous generated code has failed. If you think you can fix the 
problem by rewriting the code, please generate code and run it again.
Otherwise, please explain the problem to me.
< Python> 
<Execution_Result>h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$17ac36c9-c065-4b92-94c1-99ebc9916ff4h
)}(h
}(hX  Thought ProgramApe understands that the execution of the previous round has failed.
ThoughtProgramApe will attempt to fix the problem by using an alternative method to load QQQ 
historical data using the yfinance library.
Python!pip install yfinance
import yfinance as yf
import datetimeh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$965788b2-83dc-47ca-a577-e6a1d604bdddh
)}(h
}(hX
  start_date = datetime.datetime.now () - datetime.timedelta (days=365)
end_date = datetime.datetime.now ()
qqq_data = yf.download ('QQQ', start=start_date, end=end_date )
qqq_data
Execution_Status SUCCESS
The execution of the generated python code above has succeededh}(hj  hj  hKuhh	uh(hhh}h}(hj   hj  usub$0d108c53-648e-4e0a-953c-4697d6eca174h
)}(h
}(hX  The result of above Python code after execution is:
To Planner <Information above>
Plan <Plan does not change, not displayed>
Current_Plan_Step 2. preprocess the data
To CodeInterpreter Please preprocess the QQQ historical data and prepare it for ARIMA model trainingh}(hj  hj  hKuhh	uh(hhh}h}(hj	  hj
  usub$10236c54-7108-4c75-ac7e-e938d8fce17dh
)}(h
}(hThoughtProgramApe will generate a code snippet to preprocess the QQQ historical data and 
prepare it for ARIMA model training.
Pythonimport pandas as pd
# Keep only the 'Close' column
qqq_close = qqq_data ['Close']
# Resample the data to daily frequencyh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$bc3ad3df-68dc-4474-ad13-8681324e059eh
)}(h
}(hX
  qqq_daily = qqq_close.resample ('D').mean()
# Fill missing values with the previous day's value
qqq_daily_filled = qqq_daily.fillna (method='ffill')
qqq_daily_filled
Execution_Status SUCCESS
Execution_ResultThe execution of the generated python code above has succeededh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$437b0d83-17dd-4549-b5d0-d96ca4318a55h
)}(h
}(hThe result of above Python code after execution is: Date
2022-11-28    282.709991
2022-11-29    280.570007
                 ...    
2023-11-23    390.059998
2023-11-24    389.510010h}(hj  hj  hKuhh	uh(hhh}h}(hj$  hj%  usub$714fb079-f9ad-42ba-9c57-17aec02012a1h
)}(h
}(hX  Freq: D, Name: Close, Length: 362, dtype: float64
To Planner <Information above>
Plan <Plan does not change, not displayed>
Current_Plan_Step 3. train the ARIMA model
To CodeInterpreter Please train the ARIMA model using the preprocessed QQQ historical datah}(hj  hj  hKuhh	uh(hhh}h}(hj-  hj.  usub$f83ea58b-6bd1-4aec-9fdf-02fde3195504h
)}(h
}(hThoughtProgramApe will generate a code snippet to train the ARIMA model using the 
preprocessed QQQ historical data.
Pythonfrom statsmodels.tsa.arima.model import ARIMA
# Fit the ARIMA model
model = ARIMA(qqq_daily_filled, order=(1, 1, 1))h}(hj  hj  hKuhh	uh(hhh}h}(hj6  hj7  usub$ff1ea11e-b3c8-418f-9c8c-719ae57bb47dh
)}(h
}(hX  model_fit = model.fit ()
model_fit
Execution_Status SUCCESS
Execution_ResultThe execution of the generated python code above has succeeded
The result of above Python code after execution is: 
<statsmodels.tsa.arima.model.ARIMAResultsWrapper at 0x7f53c83c9040>h}(hj  hj  hKuhh	uh(hhh}h}(hj?  hj@  usub$690acb39-0e60-4f7f-99d7-0dc1d576e313h
)}(h
}(hTo Planner <Information above>
Plan <Plan does not change, not displayed>
Current_Plan_Step 4. forecast QQQ for the next 7 days
To CodeInterpreter Please use the trained ARIMA model to forecast QQQ for the next 7 daysh}(hj  hj  hKuhh	uh(hhh}h}(hjH  hjI  usub$38e56134-110f-4112-a74a-3475267ab664h
)}(h
}(hX3  ThoughtProgramApe will generate a code snippet to use the trained ARIMA model to forecast 
QQQ for the next 7 days.
Pythonforecast_steps = 7
forecast = model_fit.forecast (steps=forecast_steps )
forecast
Execution_Status SUCCESS
Execution_ResultThe execution of the generated python code above has succeededh}(hj  hj  hKuhh	uh(hhh}h}(hjQ  hjR  usub$a56844a6-371d-49fe-bb70-5ac684b20c05h
)}(h
}(hThe result of above Python code after execution is: 
2023-11-25    389.557766
2023-11-26    389.530163
2023-11-27    389.546117
2023-11-28    389.536896h}(hj  hj  hKuhh	uh(hhh}h}(hjZ  hj[  usub$11443878-55a2-4d6d-8875-ea50ee202642h
)}(h
}(h2023-11-29    389.542226
2023-11-30    389.539145
2023-12-01    389.540926
Freq: D, Name: predicted_mean, dtype: float64
To Planner <Information above>h}(hj  hj  hKuhh	uh(hhh}h}(hjc  hjd  usub$228fd056-4bcf-4ebd-b8c0-e0e2fc54cacbh
)}(h
}(hPlan <Plan does not change, not displayed>
Current_Plan_Step 5. report the forecasted values to the user
To UserThe ARIMA model has been used to forecast QQQ for the next 7 days. The forecasted 
values are as follows:h}(hj  hj  hKuhh	uh(hhh}h}(hjl  hjm  usub$a606f761-f78b-4da6-af3f-27b4e1c53d2bh
)}(h
}(h^2023-11-25: 389.56
2023-11-26: 389.53
2023-11-27: 389.55
2023-11-28: 389.54
2023-11-29: 389.54h}(hj  hj  hKuhh	uh(hhh}h}(hju  hjv  usub$f3456389-d2f3-4ece-827d-656daf74c465h
)}(h
}(h2023-11-30: 389.54
2023-12-01: 389.54Planner
Execution_ResultCodeInterpreter
Planner
CodeInterpreter
CodeInterpreterPlanner
Planner
CodeInterpreter
Planner
19 Role Type Content
User To Planner use ARIMA model to forecast QQQ in next 7 daysh}(hj  hj  hKuhh	uh(hhh}h}(hj~  hj  usub$2f1e2411-9aa1-46a4-9fdb-0e8c191a0443h
)}(h
}(hX  Plan1. instruct CodeInterpreter to load QQQ historical data
2. preprocess the data
3. train the ARIMA model
4. forecast QQQ for the next 7 days
5. report the forecasted values to the user
Current_Plan_Step 1. instruct CodeInterpreter to load QQQ historical datah}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$451664cc-9c2d-4329-bc0b-004780d9666bh
)}(h
}(hX?  To CodeInterpreter Please load QQQ historical data
ThoughtProgramApe will generate a code snippet to load QQQ historical data using 
pandas_datareader library.
Pythonimport pandas_datareader as pdr
import datetime
start_date = datetime.datetime.now () - datetime.timedelta (days=365)
end_date = datetime.datetime.now ()h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$be89f734-c5fc-44ad-b4aa-861e8cabb160h
)}(h
}(hX:  qqq_data = pdr.get_data_yahoo ('QQQ', start=start_date, end=end_date )
qqq_data
Execution_Status FAILURE
Execution_ResultThe execution of the generated python code above has failed
During execution, the following messages were logged:
---
TypeError                                 Traceback (most recent call last)h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$c8aba649-ebdb-466f-8376-e8d332014e3ch
)}(h
}(hCell In[1], line 7
           4 start_date = datetime.datetime.now() - datetime.timedelta(days=365)
           5 end_date = datetime.datetime.now()
  ---> 7 qqq_data = pdr.get_data_yahoo('QQQ', start=start_date, end=end_date)h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$263bafc1-9e41-4b9b-a937-318500d4d5f9h
)}(h
}(h           8 qqq_data
File \path\to\python\Lib\site-packages\pandas_datareader\data.py:80, in 
get_data_yahoo(*args, **kwargs)
         79 def get_data_yahoo(*args, **kwargs):h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$7f10541c-9981-45ad-ae48-013ad25b3f36h
)}(h
}(hX'   --->  80     return YahooDailyReader(*args, **kwargs).read()
          ...
To CodeInterpreterThe execution of the previous generated code has failed. If you think you can fix the 
problem by rewriting the code, please generate code and run it again.
Otherwise, please explain the problem to me.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$47fb6ee7-2429-4a6d-b1d0-a3e458cc73cch
)}(h
}(hX-  < Python> 
<Execution_Result>
Thought ProgramApe understands that the execution of the previous round has failed.
ThoughtProgramApe will attempt to fix the problem by using an alternative method to load QQQ 
historical data using the yfinance library.
Python!pip install yfinance
import yfinance as yfh}(hj  hj  hM uhh	uh(hhh}h}(hj  hj  usub$18f0cd86-b22f-4a8b-892b-8856b9d709e1h
)}(h
}(hX  import datetime
start_date = datetime.datetime.now () - datetime.timedelta (days=365)
end_date = datetime.datetime.now ()
qqq_data = yf.download ('QQQ', start=start_date, end=end_date )
qqq_data
Execution_Status SUCCESS
The execution of the generated python code above has succeededh}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$0f3aab77-8ac4-454e-91e8-f711544ec104h
)}(h
}(hX  The result of above Python code after execution is:
To Planner <Information above>
Plan <Plan does not change, not displayed>
Current_Plan_Step 2. preprocess the data
To CodeInterpreter Please preprocess the QQQ historical data and prepare it for ARIMA model trainingh}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$2205dea6-1351-442e-863a-ca6cedc8f399h
)}(h
}(hThoughtProgramApe will generate a code snippet to preprocess the QQQ historical data and 
prepare it for ARIMA model training.
Pythonimport pandas as pd
# Keep only the 'Close' column
qqq_close = qqq_data ['Close']
# Resample the data to daily frequencyh}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$569a3c3b-8f8d-4103-91a2-b5d3bd7ca76fh
)}(h
}(hX
  qqq_daily = qqq_close.resample ('D').mean()
# Fill missing values with the previous day's value
qqq_daily_filled = qqq_daily.fillna (method='ffill')
qqq_daily_filled
Execution_Status SUCCESS
Execution_ResultThe execution of the generated python code above has succeededh}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$680f3f0f-cbd6-49de-85e9-f42dcb939c09h
)}(h
}(hThe result of above Python code after execution is: Date
2022-11-28    282.709991
2022-11-29    280.570007
                 ...    
2023-11-23    390.059998
2023-11-24    389.510010h}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$0de9cac4-d217-4051-91fb-66ba0615bb1bh
)}(h
}(hX  Freq: D, Name: Close, Length: 362, dtype: float64
To Planner <Information above>
Plan <Plan does not change, not displayed>
Current_Plan_Step 3. train the ARIMA model
To CodeInterpreter Please train the ARIMA model using the preprocessed QQQ historical datah}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$92d89723-0188-4389-9e7e-3e70e98a56f9h
)}(h
}(hThoughtProgramApe will generate a code snippet to train the ARIMA model using the 
preprocessed QQQ historical data.
Pythonfrom statsmodels.tsa.arima.model import ARIMA
# Fit the ARIMA model
model = ARIMA(qqq_daily_filled, order=(1, 1, 1))h}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$29daef91-dacf-4a4d-85e8-48bfe8653326h
)}(h
}(hX  model_fit = model.fit ()
model_fit
Execution_Status SUCCESS
Execution_ResultThe execution of the generated python code above has succeeded
The result of above Python code after execution is: 
<statsmodels.tsa.arima.model.ARIMAResultsWrapper at 0x7f53c83c9040>h}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$da168c0a-8d6a-4757-aafc-a6115bd4c96eh
)}(h
}(hTo Planner <Information above>
Plan <Plan does not change, not displayed>
Current_Plan_Step 4. forecast QQQ for the next 7 days
To CodeInterpreter Please use the trained ARIMA model to forecast QQQ for the next 7 daysh}(hj  hj  hM	uhh	uh(hhh}h}(hj  hj  usub$3bf6c400-5992-4f21-8524-7d99304b825bh
)}(h
}(hX3  ThoughtProgramApe will generate a code snippet to use the trained ARIMA model to forecast 
QQQ for the next 7 days.
Pythonforecast_steps = 7
forecast = model_fit.forecast (steps=forecast_steps )
forecast
Execution_Status SUCCESS
Execution_ResultThe execution of the generated python code above has succeededh}(hj  hj  hM
uhh	uh(hhh}h}(hj  hj  usub$bb0d935c-47a5-4efb-bf6e-d6364cf6b1a0h
)}(h
}(hThe result of above Python code after execution is: 
2023-11-25    389.557766
2023-11-26    389.530163
2023-11-27    389.546117
2023-11-28    389.536896h}(hj  hj  hMuhh	uh(hhh}h}(hj   hj!  usub$55699e49-176d-4016-a49c-e82db8f32b62h
)}(h
}(h2023-11-29    389.542226
2023-11-30    389.539145
2023-12-01    389.540926
Freq: D, Name: predicted_mean, dtype: float64
To Planner <Information above>h}(hj  hj  hMuhh	uh(hhh}h}(hj)  hj*  usub$51057cc8-188a-4b34-8b32-c4eff862522fh
)}(h
}(hPlan <Plan does not change, not displayed>
Current_Plan_Step 5. report the forecasted values to the user
To UserThe ARIMA model has been used to forecast QQQ for the next 7 days. The forecasted 
values are as follows:h}(hj  hj  hM
uhh	uh(hhh}h}(hj2  hj3  usub$ecb95b4c-ff46-4815-bc50-cff9542803e4h
)}(h
}(h^2023-11-25: 389.56
2023-11-26: 389.53
2023-11-27: 389.55
2023-11-28: 389.54
2023-11-29: 389.54h}(hj  hj  hMuhh	uh(hhh}h}(hj;  hj<  usub$cddf4e8e-a547-4aa6-959c-01d047e26272h
)}(h
}(h2023-11-30: 389.54
2023-12-01: 389.54Planner
Execution_ResultCodeInterpreter
Planner
CodeInterpreter
CodeInterpreterPlanner
Planner
CodeInterpreter
Planner
20 B Planning Exampleh}(hj  hj  hMuhh	uh(hhh}h}(hjD  hjE  usub$8ccb0789-3260-4cf5-bb79-482ba0394a25h
)}(h
}(hX  The following is a planning example for the user query of counting the rows of a CSV file. The
example consists of 4 posts:
1. This is the post from the user containing the user query.
2.This is the post from the Planner to the Code Interpreter. The post contains 3 attachments that areh}(hj  hj  hMuhh	uh(hhh}h}(hjM  hjN  usub$d3c7f88d-4a33-402d-a182-a7a97c68accbh
)}(h
}(hX  the initial plan, the plan (final), and the current step in the plan.
3. This is the post from the Code Interpreter to the Planner on the execution result.
4.This is the post from the Planner to the User, i.e., the response after successfully conducting the
query.h}(hj  hj  hMuhh	uh(hhh}h}(hjV  hjW  usub$2ed7d9c0-daba-4d5b-92b9-bd45fb86e3b3h
)}(h
}(hThese 4 posts consist of a round of conversation between the Planner and the User.
1user_query: count the rows of /home/data.csv
2post_list:
3- message: count the rows of /home/data.csv
4 send_from: User
5 send_to: Planner
6 attachment_list:h}(hj  hj  hMuhh	uh(hhh}h}(hj_  hj`  usub$6d747b38-8c86-4b32-a387-d606aa93ad10h
)}(h
}(h7- message: Please load /home/data.csv and count the rows
8 send_from: Planner
9 send_to: CodeInterpreter
10 attachment_list:
11 - type: init_plan
12 content: |-
13 1. load the data fileh}(hj  hj  hMuhh	uh(hhh}h}(hjh  hji  usub$d5a217d8-cbd0-478c-b985-63295f6610b0h
)}(h
}(h14 2. count the rows of the loaded data <sequential depend on 1>
15 3. report the result to the user <interactive depend on 2>
16 - type: plan
17 content: |-
18 1. instruct CI to load the file and count the rowsh}(hj  hj  hMuhh	uh(hhh}h}(hjq  hjr  usub$7e0daeb5-4c33-4d8e-881c-8490507599e5h
)}(h
}(h19 2. report the result to the user
20 - type: current_plan_step
21 content: 1. instruct CI to load the file and count the rows
22- message: Load successfully and there are 100 rows
23 send_from: CodeInterpreter
24 send_to: Planner
25 attachment_list:h}(hj  hj  hMuhh	uh(hhh}h}(hjz  hj{  usub$cb5978f9-4374-4670-8f36-fb230ad977adh
)}(h
}(h26- message: The file is loaded and there are 100 rows
27 send_from: Planner
28 send_to: User
29 attachment_list:
30 - type: init_plan
31 content: |-
32 1. load the data file
33 2. count the rows of the loaded data <sequential depend on 1>h}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$adac108f-f2d5-4492-87a4-12cb75c418abh
)}(h
}(h34 3. report the result to the user <interactive depend on 2>
35 - type: plan
36 content: |-
37 1. instruct CI to load the file and count the rows
38 2. report the result to the user
39 - type: current_plan_steph}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$41688d6b-f05f-443e-b1b6-7e1ad623de20h
)}(h
}(hX  40 content: 2. report the result to the user
C Code Generation Example
The following is a code generation example for the user query of generating 10 random numbers.
This example contains 2 posts:
1. This is the post from the user containing the user query.h}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$7d8e77cb-d48d-41b0-9010-78e4020d46edh
)}(h
}(hX  21 2.This is the post from the Code Interpreter to the Planner including thoughts, the generated code,
and the verification and execution results.
1user_query: generate 10 random numbers
2post_list:
3- message: generate 10 random numbers
4 send_from: Planner
5 send_to: CodeInterpreterh}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$e79b9115-554c-4d16-b0c6-11768e13a969h
)}(h
}(h6 attachment_list: []
7- message: The random numbers are 0.2, 0.4, 0.6, ...
8 send_from: CodeInterpreter
9 send_to: Planner
10 attachment_list:
11 - type: thought
12 content: CI will generate 10 random numbers using np.random.h}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$604d6b9b-bdac-4bf5-83d8-eb15e5ef5c89h
)}(h
}(hX  13 - type: python
14 content: |-
15 import numpy as np
16 random_numbers = np.random.rand(10)
17 random_numbers
18 - type: verification
19 content: CORRECT
20 - type: code_error
21 content: No error is detected.
22 - type: execution_status
23 content: SUCCESSh}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$efe46536-03b1-48a9-8f0b-148ec5a9c0ceh
)}(h
}(h24 - type: execution_result
25 content: The random numbers are 0.2, 0.4, 0.6, ...
D Plugin Example
The following is an example of the specification of a plugin named ‘anomaly_detection’. It has 3h}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$aa1de0c8-ac48-4e03-8895-6c4e0c2af4e0h
)}(h
}(hX  input arguments (df, ts_col, and val_col), and 2 return values (df and description).
1name: anomaly_detection
2description: >-
3 anomaly_detection function identifies anomalies from an input
4 DataFrame of time series. It will add a new column "Is_Anomaly",h}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$4191fa93-4d28-4374-845d-7e2a8222bf27h
)}(h
}(h5 where each entry will be marked with "True"
6 if the value is an anomaly or "False" otherwise.
7
8parameters:
9 - name: df
10 type: DataFrame
11 required: true
12 description: >-
13 the input data from which we can identify the anomaliesh}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$9c3e1a07-d95b-431e-9d58-973167220329h
)}(h
}(hX  14 with the 3-sigma algorithm.
15 - name: ts_col
16 type: str
17 required: true
18 description: name of the column that contains the datetime
19 - name: val_col
20 type: str
21 required: true
22 description: name of the column that contains the numeric values.
23h}(hj  hj  hMuhh	uh(hhh}h}(hj  hj  usub$e8cac764-cef2-4a23-9d33-f4803df937f6h
)}(h
}(hX)  24returns:
25 - name: df
26 type: DataFrame
27 description: >-
28 This DataFrame extends the input DataFrame with a newly-added
29 column "Is_Anomaly" containing the anomaly detection result.
22 30 - name: description
31 type: str
32 description: a string describing the anomaly detection results.h}(hj  hj  hM uhh	uh(hhh}h}(hj  hj  usub$79effb14-330a-4dd6-b308-e842935a463dh
)}(h
}(hThe following shows the Python implementation of the anomaly_detection plugin.
1def __call__(self, df: pd.DataFrame, ts_col: str, val_col: str):
2 try:
3 df[ts_col] = pd.to_datetime(df[ts_col])
4 except Exception:h}(hj  hj  hM!uhh	uh(hhh}h}(hj  hj  usub$60b03c94-0ce3-4edb-a669-9d9ac40a4314h
)}(h
}(h5 print("Time column is not datetime")
6 return
7
8 if not is_numeric_dtype(df[val_col]):
9 try:
10 df[val_col] = df[val_col].astype(float)
11 except ValueError:
12 print("Value column is not numeric")
13 return
14h}(hj  hj  hM"uhh	uh(hhh}h}(hj  hj  usub$59dda7a5-6ca3-4da7-8bdb-420ebb812869h
)}(h
}(h15 mean, std = df[val_col].mean(), df[val_col].std()
16 cutoff = std * 3
17 l, u = mean - cutoff, mean + cutoff
18 df["Is_Anomaly"] = df[val_col].apply(lambda x: x < l or x > u)h}(hj  hj  hM#uhh	uh(hhh}h}(hj  hj  usub$be7c3f24-b7fe-430e-8db0-57b9fd8c0dafh
)}(h
}(h19 anomaly_count = df["Is_Anomaly"].sum()
20 desc = f"There are {anomaly_count} anomalies in the data"
21
22 return df, desc
23h}(hj  hj  hM$uhh	uh(hhh}h}(hj  hj  usub$7cacb864-891d-46dd-a876-0b9e57a6403fh
)}(h
}(hX3  ---
id: code_verification
description: Code Verification
slug: code_verification
---

# Code Verification

## Why do we need code verification?
Agent developers may want to verify the generated code before running it.
In the following example shown in the figure, the generated code is 
```python
import os
h}(hdocs\verification.mdhid: code_verificationhK uhh	uh(hhh}h}(hj
  hj  usub$f61cbbd2-a82a-454c-ab60-dd729379f128h
)}(h
}(hfile = open("abc.py", "r")
code = file.read()
exec(code)
```

![Code verification workflow](./../static/img/code_verification.png)

In this case, the generated code read the content of a file and execute it, which is potentially dangerous.h}(hj  hj
  hKuhh	uh(hhh}h}(hj  hj  usub$0bae6a1e-eca1-4f1b-9e86-12f27049ff73h
)}(h
}(hTo prevent this, we can use code verification to check the generated code before running it.
If the code does not pass the verification, the errors will be reported to the CodeGenerator to re-generate the code.

## How to use code verification?h}(hj  hj
  hKuhh	uh(hhh}h}(hj  hj  usub$d6fb1ae6-3c87-4443-938f-a25938cf662dh
)}(h
}(hXX  To use code verification, you can add the configuration `"code_interpreter.code_verification_on": true`
in the project configuration file `taskweaver_config.json`.
Then, the generated code will be verified before running it.

After enabling code verification, you can also specify the code verification rules in the project configuration file.
h}(hj  hj
  hKuhh	uh(hhh}h}(hj'  hj(  usub$4d3053ad-2cbd-49c9-a3bc-556d9922baadh
)}(h
}(h| Parameter	                          | Description	                                              | Default                                                                                                                                    |
|---|---|---|h}(hj  hj
  hKuhh	uh(hhh}h}(hj0  hj1  usub$0cd8c53a-b78b-4369-b106-d00edc65df25h
)}(h
}(h| code_interpreter.allowed_modules	   | The list of allowed modules to import in code generation. | 	["pandas", "matplotlib", "numpy", "sklearn", "scipy", "seaborn", "datetime", "typing"], if the list is empty, no modules would be allowed |h}(hj  hj
  hKuhh	uh(hhh}h}(hj9  hj:  usub$b77b53df-b032-4dc9-b8bd-fd58513d8c07h
)}(h
}(hX:  | code_interpreter.blocked_functions	 | The list of functions to block from code generation.      | 	["eval", "exec", "execfile", "compile", "open", "input", "raw_input", "reload"]                                                           |

We currently support checking the allowed modules and blocked functions.h}(hj  hj
  hKuhh	uh(hhh}h}(hjB  hjC  usub$430b16d1-097b-491f-b284-bc2049089821h
)}(h
}(hX  If a module is in the `allowed_modules` list, it can be imported in the generated code.
If a function is in the `blocked_functions` list, it cannot be used in the generated code.

## How is code verification implemented?
The code verification is implemented in two parts.h}(hj  hj
  hKuhh	uh(hhh}h}(hjK  hjL  usub$76aff861-5128-4bc1-93ac-c4de857c9b9ah
)}(h
}(hX  First, we add instructions to CodeGenerator's prompt, so it generates code following the required rules.
For example, if we don't want the generated code to import the `os` module, 
we will add an instruction to the prompt to tell CodeGenerator not to import the `os` module.
h}(hj  hj
  hKuhh	uh(hhh}h}(hjT  hjU  usub$1e5c226f-5791-49ca-9e96-86df661eab9ah
)}(h
}(hSecond, we conduct a static analysis on the generated code to check if it follows the required rules.
The code is parsed into an AST (Abstract Syntax Tree) and we check if the AST  
only imports the allowed modules, and does not use the blocked functions.h}(hj  hj
  hK	uhh	uh(hhh}h}(hj]  hj^  usub$975b1f93-720a-4262-beb1-2183d4203c28h
)}(h
}(hmThe implementation of the static analysis can be found in `taskweaver/code_interpreter/code_verification.py`.h}(hj  hj
  hK
uhh	uh(hhh}h}(hjf  hjg  usub$e5c0994e-f1a4-47bf-85bb-a7515136974dh
)}(h
}(h# Customizing Examples

There are two types of examples: (1) planning examples and (2) code interpreter examples. 
Planning examples are used to demonstrate how to use TaskWeaver to plan for a specific task. h}(h%docs\customization\example\example.mdh# Customizing ExampleshK uhh	uh(hhh}h}(hjo  hjp  usub$40a808ca-ff19-4f45-a2aa-7c765859c138h
)}(h
}(hXX  Code generation examples are used to demonstrate how to generate code or orchestrate plugins to perform a specific task.

## Planning Examples

A planning example tells LLMs how to plan for a specific query from the user; talk to the code interpreter; 
receive the execution result from the code interpreter; and summarize the execution result.h}(hjq  hjr  hKuhh	uh(hhh}h}(hjz  hj{  usub$d8e58197-edca-4232-9bfc-235bd8c671eah
)}(h
}(hX
  Before constructing the planning example, we strongly encourage you to go through the
[planner prompt](https://github.com/microsoft/TaskWeaver/blob/main/taskweaver/planner/planner_prompt.yaml).

The following is an example of a planning example which contains 4 posts. h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$aa60a8e0-80ef-45d0-9aca-fd50e917df64h
)}(h
}(hEach post contains a message, a sender, a receiver, and a list of attachments.
1. The first post is sent from the user to the planner.
   The message is "count the rows of /home/data.csv", which is the same as the user query.h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$4daf3109-5993-4a41-9c71-e5a08ff5e7f0h
)}(h
}(h2. The second post is sent from the planner to the code interpreter.
   The message is "Please load the data file /home/data.csv and count the rows of the loaded data".
   The attachment list contains 3 attachments:h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$4ab09ae7-3255-432f-ad9d-ad215be33c99h
)}(h
}(h   1. The first attachment is the initial plan, which is a markdown string.
   2. The second attachment is the plan, which is a markdown string.
   3. The third attachment is the current plan step, which is a markdown string.h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$6e8f514b-c80b-4dd4-9236-528634f0df75h
)}(h
}(h3. The third post is sent from the code interpreter to the planner.
   The message is "Load the data file /home/data.csv successfully and there are 100 rows in the data file".
4. The fourth post is sent from the planner to the user.h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$72c9aaba-8574-41ef-bb92-3301abcf2c45h
)}(h
}(hX     The message is "The data file /home/data.csv is loaded and there are 100 rows in the data file".
   The attachment list contains 3 attachments:
   1. The first attachment is the initial plan, which is the same as the second attachment of the second post.h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$cc312082-687e-408c-8a6a-bfe61db6339eh
)}(h
}(hX     2. The second attachment is the plan, which is the same as the third attachment of the second post.
   3. The third attachment is the current plan step, which is a markdown string.

```yaml
enabled: True
rounds:
  - user_query: count the rows of /home/data.csvh}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$c95919f4-9cdf-4d55-b8fe-7bdd9578d658h
)}(h
}(hX      state: created
    post_list:
      - message: count the rows of /home/data.csv
        send_from: User
        send_to: Planner
        attachment_list:
      - message: Please load the data file /home/data.csv and count the rows of the loaded data
        send_from: Plannerh}(hjq  hjr  hK	uhh	uh(hhh}h}(hj  hj  usub$b003a4ce-855a-4b73-9b10-db4ca6286d4fh
)}(h
}(h        send_to: CodeInterpreter
        attachment_list:
        - type: init_plan
          content: |-
            1. load the data file
            2. count the rows of the loaded data <sequentially depends on 1>h}(hjq  hjr  hK
uhh	uh(hhh}h}(hj  hj  usub$80e13336-d58a-440a-993c-1de7b8a75a51h
)}(h
}(hX'              3. report the result to the user <interactively depends on 2>
        - type: plan
          content: |-
            1. instruct CodeInterpreter to load the data file and count the rows of the loaded data
            2. report the result to the user
        - type: current_plan_steph}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$17d06d91-6e2b-4951-8dd6-55bc0d13c945h
)}(h
}(hX'            content: 1. instruct CodeInterpreter to load the data file and count the rows of the loaded data
      - message: Load the data file /home/data.csv successfully and there are 100 rows in the data file
        send_from: CodeInterpreter
        send_to: Planner
        attachment_list:h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$c51390f7-e9f6-43f4-9ec6-0c8f6f96aea3h
)}(h
}(hX        - message: The data file /home/data.csv is loaded and there are 100 rows in the data file
        send_from: Planner
        send_to: User
        attachment_list:
          - type: init_plan
            content: |-
              1. load the data fileh}(hjq  hjr  hK
uhh	uh(hhh}h}(hj  hj  usub$8c3f8495-2402-4a0d-8236-f8b45ad87babh
)}(h
}(h              2. count the rows of the loaded data <sequentially depends on 1>
              3. report the result to the user <interactively depends on 2>
          - type: plan
            content: |-h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$be2dcdf5-8c68-4ee3-80f2-0760f0d13626h
)}(h
}(hX                1. instruct CodeInterpreter to load the data file and count the rows of the loaded data
              2. report the result to the user
          - type: current_plan_step
            content: 2. report the result to the user
```

## Code Interpreter Examples
h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$3eb2313a-46ff-4ace-abb5-aa9addca5894h
)}(h
}(hA code interpreter example tells LLMs how to generate code or orchestrate plugins to perform a specific task.
The task is from the planner. Before constructing the code interpreter example, we strongly encourage you toh}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$a6e1ee68-a03d-4d51-b3b0-4f369435def8h
)}(h
}(hread the [code generator prompt](https://github.com/microsoft/TaskWeaver/blob/main/taskweaver/code_interpreter/code_generator/code_generator_prompt.yaml). 

The following is an example of a code interpreter example which contains 2 posts.h}(hjq  hjr  hKuhh	uh(hhh}h}(hj
  hj  usub$3fd854aa-06b7-4071-953a-b4cda5dfe0c0h
)}(h
}(hX  Each post contains a message, a sender, a receiver, and a list of attachments.

1. The first post is sent from the planner to the code interpreter.
   The message is "Please read file /abc/def.txt".
2. The second post is sent from the code interpreter to the planner.h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$0074a247-7100-4ba7-9860-f40fe112e29bh
)}(h
}(hX      The message is "read file /abc/def.txt".
   The attachment list contains 6 attachments:
   1. The first attachment is the thought of the code interpreter, which is a markdown string.
   2. The second attachment is the generated code, which is in python.h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$d49142ba-7d40-4ddd-adbd-47511f886666h
)}(h
}(hX     3. The third attachment is the verification status, which is CORRECT, INCORRECT, or NONE.
   4. The fourth attachment is the verification error message, which is a markdown string.
   5. The fifth attachment is the execution status, which is SUCCESS, FAILURE, or NONE.h}(hjq  hjr  hKuhh	uh(hhh}h}(hj%  hj&  usub$534289eb-13f6-46a4-9b87-a1106b2979efh
)}(h
}(h   6. The sixth attachment is the execution result, which is a markdown string.

```yaml
enabled: True
rounds:
  - user_query: read file /abc/def.txt
    state: finished
    post_list:
      - message: read file /abc/def.txt
        send_from: Plannerh}(hjq  hjr  hKuhh	uh(hhh}h}(hj.  hj/  usub$b0c935a3-e83a-48e7-bf11-2879aef6151ch
)}(h
}(hX          send_to: CodeInterpreter
        attachment_list: []
      - message: I'm sorry, I cannot find the file /abc/def.txt. An FileNotFoundException has been raised.
        send_from: CodeInterpreter
        send_to: Planner
        attachment_list:
          - type: thoughth}(hjq  hjr  hKuhh	uh(hhh}h}(hj7  hj8  usub$3a748eed-7931-41a0-bdd0-63c4386b9a25h
)}(h
}(hX              content: "{ROLE_NAME} will generate a code snippet to read the file /abc/def.txt and present the content to the user."
          - type: python
            content: |-
              file_path = "/abc/def.txt"  

              with open(file_path, "r") as file:  h}(hjq  hjr  hKuhh	uh(hhh}h}(hj@  hjA  usub$79b35a73-62fb-47d1-84a0-c5fae44bed66h
)}(h
}(hX6                    file_contents = file.read()  
                  print(file_contents)
          - type: verification
            content: CORRECT
          - type: code_error
            content: No code error.
          - type: execution_status
            content: FAILURE
          - type: execution_resulth}(hjq  hjr  hKuhh	uh(hhh}h}(hjI  hjJ  usub$733129ca-17f6-4d2d-baae-d6ae0317ae9eh
)}(h
}(hX
              content: FileNotFoundException, the file /abc/def.txt does not exist.
```

In this example, `verification` is about whether the generated code is correct or not. 
We implemented a module to verify the generated code. If the code is syntactically incorrect, h}(hjq  hjr  hKuhh	uh(hhh}h}(hjR  hjS  usub$cde78bb3-a074-4831-863b-00494ce17269h
)}(h
}(hor the code violates the constraints, the verification status will be `INCORRECT` 
and some error messages will be returned.
A verification of NONE means that the code has not been verified, which means verification has been turned off.
h}(hjq  hjr  hKuhh	uh(hhh}h}(hj[  hj\  usub$72e73b38-ada3-4ebb-b53a-b9f38acccd08h
)}(h
}(hX:  In this example, `execution_status` is about whether the generated code can be executed successfully or not.
If the execution is successful, the execution status will be `SUCCESS` and the execution result will be returned.
Otherwise, the execution status will be `FAILURE` and some error messages will be returned.h}(hjq  hjr  hKuhh	uh(hhh}h}(hjd  hje  usub$d2e7078c-49a3-4e9a-963f-9a20c755eba9h
)}(h
}(hGA execution_status of `NONE` means that the code has not been executed.h}(hjq  hjr  hKuhh	uh(hhh}h}(hjm  hjn  usub$fa50a5e1-dbd2-475a-8839-77a4591422e9h
)}(h
}(hX  # Embedding

In TaskWeaver, we support various embedding models to generate embeddings for auto plugin selection.

## Embedding Configration

- `llm.embedding_api_type`: The type of the embedding API. We support the following types:
  - openai
  - qwen
  - ollamah}(h&docs\customization\plugin\embedding.mdh# EmbeddinghK uhh	uh(hhh}h}(hjv  hjw  usub$ed08c264-d529-4959-8706-db833ef376e2h
)}(h
}(hX    - sentence_transformers
  - glm

- `llm.embedding_model`: The embedding model name. The model name should be aligned with `llm.embedding_api_type`.
   We only list some embedding models we have tested below:
  - openai
    - text-embedding-ada-002
  - qwenh}(hjx  hjy  hKuhh	uh(hhh}h}(hj  hj  usub$3ffd022f-11db-483e-ab2c-6ddd1a9b8b4ch
)}(h
}(h    - text-embedding-v1
  - ollama
    - llama2
  - sentence_transformers
    - all-mpnet-base-v2
    - multi-qa-mpnet-base-dot-v1
    - all-distilroberta-v1
    - all-MiniLM-L12-v2h}(hjx  hjy  hKuhh	uh(hhh}h}(hj  hj  usub$ab1ea8e4-b51c-47e9-ba84-6cbbba4deffah
)}(h
}(h    - multi-qa-MiniLM-L6-cos-v1
  - zhipuai
    - embedding-2
You also can use other embedding models supported by the above embedding APIs.h}(hjx  hjy  hKuhh	uh(hhh}h}(hj  hj  usub$7684187e-2120-4da3-9ade-eda1a52a3eb3h
)}(h
}(hX  ---
id: develop_plugin
description: How to develop a new plugin
slug: /plugin/how_to_develop_a_new_plugin
---
# How to develop a new plugin

In this tutorial, we will introduce how to develop a strawman plugin in TaskWeaver. This plugin can render a input text in ascii art. 
h}(h8docs\customization\plugin\how_to_develop_a_new_plugin.mdhid: develop_pluginhK uhh	uh(hhh}h}(hj  hj  usub$1f5efe04-7ca5-4b0c-a7b0-6d8e12ec880ah
)}(h
}(h## Implement the python code

Create a python file named `ascii_render.py` in the `plugins` folder. The file name should be the same as the plugin name. The plugin name is defined in the plugin schema. In this example, the plugin name is `ascii_render`.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$b552063f-a9c6-44ce-b775-0910a2631daeh
)}(h
}(hXS  The following code is the template of the plugin implementation.
```python
from taskweaver.plugin import Plugin, register_plugin

@register_plugin
class PluginTemplate(Plugin):
    def __call__(self, *args, **kwargs):
        """Implementation Starts"""
        result, description = YourImplementation()
        """Implementation Ends"""
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$4330a885-a428-477b-a9f6-459664121fe5h
)}(h
}(hX-          # if your want to add artifact from the execution result, uncomment the following code
        # self.ctx.add_artifact(
        #     name="artifact",
        #     file_name="artifact.csv",
        #     type="df",
        #     val=result,
        # )
        return result, description
```
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$36c393b0-bfe7-4df7-944d-9509cd87cfadh
)}(h
}(hXP  The typical way of implementing the plugin is to change the code between `Implementation Starts` and `Implementation Ends`. Note that the return are two variables _result_ and _description_. The _result_ stores whatever output required for follow-up processing (e.g., a DataFrame). The _description_ is a string to describe the result. h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$81a98334-a72c-4699-a325-e7f0f26c8c56h
)}(h
}(h
Let's make some changes to the code and the result is as follows:

```python
from taskweaver.plugin import Plugin, register_plugin

@register_plugin
class AsciiRenderPlugin(Plugin):
    def __call__(self, text: str):
        import pyfigleth}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$185db1b5-bf96-48b6-9daa-0bef4f1bc86dh
)}(h
}(h        ascii_art_str = pyfiglet.figlet_format(text, font='isometric1')
        return ascii_art_str
```
Note that this function depends on the package `pyfiglet`, so we need to install it with `pip install pyfiglet`.

## Configure the schema
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$f4e2bc43-f0c2-438d-bcd1-a66af302ee57h
)}(h
}(hNext, we need to configure the schema so that the LLM can understand the function 
of the plugin. In the schema, there are several fields that should be filled, 
including `name`, `enabled`, `required`, `description`, `parameters` and `returns`. h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$52dd67ff-5190-4d62-a916-5f1cd57df3b3$      h
)}(h
}(hX  Please check [Plugin Introduction](https://microsoft.github.io/TaskWeaver/docs/plugin/plugin_intro) 
for more details. 
Create a yaml file named `ascii_render.yaml` and copy the following content into it.

```yaml
name: ascii_render
enabled: true
required: true
description: >-h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$468124c7-15d6-41e6-94c5-57ac6d13bfeeh
)}(h
}(hX    This plugin renders the input text into ASCII art form. 
  The input should be a string and the output is also a string in ASCII art.
  For example, result = ascii_render(text='Hello World').

parameters:
  - name: text
    type: str
    required: true
    description: >-h}(hj  hj  hK	uhh	uh(hhh}h}(hj  hj  usub$30770426-1b68-48c8-8543-2acdd75171cfh
)}(h
}(hX        This is the input text to be rendered into ASCII art form.

returns:
  - name: result
    type: str
    description: >-
      The rendered text in ASCII art.
```

## Call the plugin

After the plugin is implemented and configured, we can call the plugin in the conversation.h}(hj  hj  hK
uhh	uh(hhh}h}(hj  hj  usub$32cef8e0-caaa-46f0-be0a-c349ea795cceh
)}(h
}(hThe full conversation is as follows:
```bash
=========================================================
 ___         _     _       __
|_   _|_ _ ___| | _ | |     / /__  ___ __   ___  ___h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$4cda8de7-38a7-48ec-9624-08ee097f0d53h
)}(h
}(ho  | |/ _` / __| |/ /| | /| / / _ \/ __ `/ | / / _ \/ ___/
  | | (_| \__ \   < | |/ |/ /  __/ /_/ /| |/ /  __/ /h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj  usub$fb7f81c6-f068-43d6-8e68-77847afe85f7h
)}(h
}(hX     |_|\__,_|___/_|\_\|__/|__/\___/\__,_/ |___/\___/_/
=========================================================
TaskWeaver: I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
Human: render ABC in ascii art
>>> [INIT_PLAN]h}(hj  hj  hK
uhh	uh(hhh}h}(hj  hj  usub$3578dadf-9ffc-464c-a712-996d0099be42h
)}(h
}(h1. Render the text 'ABC' into ASCII art
>>> [PLAN]
1. Instruct CodeInterpreter to render the text 'ABC' into ASCII art
>>> [CURRENT_PLAN_STEP]
1. Instruct CodeInterpreter to render the text 'ABC' into ASCII art
>>> [SEND_TO]
CodeInterpreter
>>> [MESSAGE]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$f092d578-36e0-4874-966a-b70c2fb44096h
)}(h
}(hPlease render the text 'ABC' into ASCII art
>>> [PLANNER->CODEINTERPRETER]
Please render the text 'ABC' into ASCII art
>>> [PYTHON]
# Call the plugin function to render the text 'ABC' into ASCII art
ascii_art_result = ascii_render("ABC")
ascii_art_resulth}(hj  hj  hKuhh	uh(hhh}h}(hj%  hj&  usub$17bd10ca-4187-4efc-81c0-6348edc526fah
)}(h
}(hX  >>> [VERIFICATION]
NONE
>>> [STATUS]
SUCCESS
>>> [RESULT]
The execution of the generated python code above has succeeded

The result of above Python code after execution is:
      ___           ___           ___     
     /\  \         /\  \         /\  \    h}(hj  hj  hKuhh	uh(hhh}h}(hj.  hj/  usub$7af3672b-eaaa-4f48-a345-ff4df3927eb5h
)}(h
}(hU    /::\  \       /::\  \       /::\  \   
   /:/\:\  \     /:/\:\  \     /:/\:\  \  h}(hj  hj  hKuhh	uh(hhh}h}(hj7  hj8  usub$81c5d3c4-d2de-42e3-8e9c-5ad463af923dh
)}(h
}(hU  /::\~\:\  \   /::\~\:\__\   /:/  \:\  \ 
 /:/\:\ \:\__\ /:/\:\ \:|__| /:/__/ \:\__\h}(hj  hj  hKuhh	uh(hhh}h}(hj@  hjA  usub$78b0d1ca-9c14-450c-bf43-a8c7632cac1eh
)}(h
}(h \/__\:\/:/  / \:\~\:\/:/  / \:\  \  \/__/
      \::/  /   \:\ \::/  /   \:\  \      
      /:/  /     \:\/:/  /     \:\  \     h}(hj  hj  hKuhh	uh(hhh}h}(hjI  hjJ  usub$bcb0e816-6cd0-4ae5-8909-bf156a6474f8h
)}(h
}(h     /:/  /       \::/__/       \:\__\    
     \/__/         ~~            \/__/    

>>> [CODEINTERPRETER->PLANNER]
The following python code has been executed:

# Call the plugin function to render the text 'ABC' into ASCII arth}(hj  hj  hKuhh	uh(hhh}h}(hjR  hjS  usub$06e897c8-ed76-482d-aa37-884e2d39dabbh
)}(h
}(hX  ascii_art_result = ascii_render("ABC")
ascii_art_result

The execution of the generated python code above has succeeded

The result of above Python code after execution is:
      ___           ___           ___     
     /\  \         /\  \         /\  \    h}(hj  hj  hKuhh	uh(hhh}h}(hj[  hj\  usub$ed001fe8-71ca-4376-b395-5fbd67b8f5e2h
)}(h
}(hU    /::\  \       /::\  \       /::\  \   
   /:/\:\  \     /:/\:\  \     /:/\:\  \  h}(hj  hj  hKuhh	uh(hhh}h}(hjd  hje  usub$27b82a49-81e4-48ab-b6f1-ea6a8b53f081h
)}(h
}(hU  /::\~\:\  \   /::\~\:\__\   /:/  \:\  \ 
 /:/\:\ \:\__\ /:/\:\ \:|__| /:/__/ \:\__\h}(hj  hj  hKuhh	uh(hhh}h}(hjm  hjn  usub$6b37efd8-1e16-47e3-8045-1b74b6140374h
)}(h
}(h \/__\:\/:/  / \:\~\:\/:/  / \:\  \  \/__/
      \::/  /   \:\ \::/  /   \:\  \      
      /:/  /     \:\/:/  /     \:\  \     h}(hj  hj  hKuhh	uh(hhh}h}(hjv  hjw  usub$b93d5b0b-e5c6-419a-a5d8-2fcbc864df4eh
)}(h
}(h     /:/  /       \::/__/       \:\__\    
     \/__/         ~~            \/__/    

>>> [INIT_PLAN]  
1. Render the text 'ABC' into ASCII art
>>> [PLAN]
1. Instruct CodeInterpreter to render the text 'ABC' into ASCII arth}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$c54c727f-6604-4fc4-bbf4-68a864ff7223h
)}(h
}(h>>> [CURRENT_PLAN_STEP]
Report the ASCII art to the user
>>> [SEND_TO]
User
>>> [MESSAGE]
Here is the text 'ABC' rendered in ASCII art:
      ___           ___           ___     
     /\  \         /\  \         /\  \    h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$43830db0-5ec6-4fe7-8718-b758433d5e69h
)}(h
}(hU    /::\  \       /::\  \       /::\  \   
   /:/\:\  \     /:/\:\  \     /:/\:\  \  h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$bb97114e-7c94-4a56-9953-b2a94803d7d5h
)}(h
}(hU  /::\~\:\  \   /::\~\:\__\   /:/  \:\  \ 
 /:/\:\ \:\__\ /:/\:\ \:|__| /:/__/ \:\__\h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$f6b7de8a-71b1-4f84-9975-0dde7163fa03h
)}(h
}(h \/__\:\/:/  / \:\~\:\/:/  / \:\  \  \/__/
      \::/  /   \:\ \::/  /   \:\  \      
      /:/  /     \:\/:/  /     \:\  \     h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$f50e9a60-994c-4c36-8f6e-779e660a5dfch
)}(h
}(h     /:/  /       \::/__/       \:\__\    
     \/__/         ~~            \/__/    
>>> [PLANNER->USER]
Here is the text 'ABC' rendered in ASCII art:
      ___           ___           ___     
     /\  \         /\  \         /\  \    h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$820e88bf-85cb-420a-8e90-84944bbfd6f4h
)}(h
}(hU    /::\  \       /::\  \       /::\  \   
   /:/\:\  \     /:/\:\  \     /:/\:\  \  h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e45a66cf-6a76-4a29-9eb3-5311e841f18ch
)}(h
}(hU  /::\~\:\  \   /::\~\:\__\   /:/  \:\  \ 
 /:/\:\ \:\__\ /:/\:\ \:|__| /:/__/ \:\__\h}(hj  hj  hK uhh	uh(hhh}h}(hj  hj  usub$bb870aad-d301-438d-8a77-e4fb507e9cb9h
)}(h
}(h \/__\:\/:/  / \:\~\:\/:/  / \:\  \  \/__/
      \::/  /   \:\ \::/  /   \:\  \      
      /:/  /     \:\/:/  /     \:\  \     h}(hj  hj  hK!uhh	uh(hhh}h}(hj  hj  usub$03f38c8e-f5c4-43a8-a373-e69462154002h
)}(h
}(h     /:/  /       \::/__/       \:\__\    
     \/__/         ~~            \/__/    
TaskWeaver: Here is the text 'ABC' rendered in ASCII art:
      ___           ___           ___     
     /\  \         /\  \         /\  \    h}(hj  hj  hK"uhh	uh(hhh}h}(hj  hj  usub$a9a548d1-e2a0-4d10-8fb6-c15e0c76e981h
)}(h
}(hU    /::\  \       /::\  \       /::\  \   
   /:/\:\  \     /:/\:\  \     /:/\:\  \  h}(hj  hj  hK#uhh	uh(hhh}h}(hj  hj  usub$0743c89f-2085-4187-962f-1305500347a0h
)}(h
}(hU  /::\~\:\  \   /::\~\:\__\   /:/  \:\  \ 
 /:/\:\ \:\__\ /:/\:\ \:|__| /:/__/ \:\__\h}(hj  hj  hK$uhh	uh(hhh}h}(hj  hj  usub$7368e6ff-d352-410c-9a66-ea76c29eb18bh
)}(h
}(h \/__\:\/:/  / \:\~\:\/:/  / \:\  \  \/__/
      \::/  /   \:\ \::/  /   \:\  \      
      /:/  /     \:\/:/  /     \:\  \     h}(hj  hj  hK%uhh	uh(hhh}h}(hj  hj  usub$85b31963-33a1-41de-a014-4f2c0f875faah
)}(h
}(h]     /:/  /       \::/__/       \:\__\    
     \/__/         ~~            \/__/        
```h}(hj  hj  hK&uhh	uh(hhh}h}(hj  hj  usub$4740501f-bf2e-482c-83bc-380a76b31ff1h
)}(h
}(hX  ---
id: multi_yaml_single_impl
description: Multiple YAML files to one Python implementation
slug: /plugin/multi_yaml_single_impl
---

# Multiple YAML files to one Python implementation

In practice, you may want to have multiple plugins that share the same implementation.h}(h3docs\customization\plugin\multi_yaml_single_impl.mdhid: multi_yaml_single_implhK uhh	uh(hhh}h}(hj  hj  usub$fce9b2b0-0ff7-490a-a321-4153b373eca2h
)}(h
}(hX  For example, you may want to have two plugins that both pull data from a database, but one pulls data from database A and the other pulls data from database B.
In this case, the plugin implementation code is the same, but the plugin configuration is different.h}(hj  hj   hKuhh	uh(hhh}h}(hj  hj	  usub$d6c5e519-59fe-41d4-8f3f-eec0b4659ff6h
)}(h
}(hXM  It would be cumbersome to copy and paste the same implementation code to two different files.

To solve this problem, TaskWeaver allows you to have multiple plugin configurations that share the same implementation.
Here is an example of the plugin configuration for the two plugins that pull data from database A and B respectively:
h}(hj  hj   hKuhh	uh(hhh}h}(hj  hj  usub$8beb8738-6ff6-4257-865a-002970ddee7bh
)}(h
}(hThe configuration for the plugin that pulls data from database A:
```yaml
name: sql_pull_data_from_A
code: sql_pull_data
...
description: >-
  Pull data from a SQL database A. The database A contains information about merchandise sales.
examples: |-h}(hj  hj   hKuhh	uh(hhh}h}(hj  hj  usub$cf4c70e8-fe96-4627-9952-006671308cc8h
)}(h
}(h  df, description = sql_pull_data_from_A("pull data from time_series table")
parameters:
  ...
returns:
  ...
configurations:
  ...
  sqlite_db_path: /path/to/sqlite_A.db
```

The configuration for the plugin that pulls data from database B:
```yamlh}(hj  hj   hKuhh	uh(hhh}h}(hj#  hj$  usub$765c656f-bcb6-4e5f-a925-386639e4bfe9h
)}(h
}(hX   name: sql_pull_data_from_B
code: sql_pull_data
...
description: >-
  Pull data from a SQL database B. The database B contains information about customer information.
examples: |-
  df, description = sql_pull_data_from_B("pull data from time_series table")
parameters:
  ...
returns:
  ...h}(hj  hj   hKuhh	uh(hhh}h}(hj,  hj-  usub$a52f4edc-73c1-4f90-8551-48d5748c263bh
)}(h
}(hconfigurations:
  ...
  sqlite_db_path: /path/to/sqlite_B.db
```

Let's discuss the differences between the two configurations. 
h}(hj  hj   hKuhh	uh(hhh}h}(hj5  hj6  usub$41433b9c-5906-4ae3-830a-0ea9a5c4af44h
)}(h
}(hX  First, you can see that the `name` field is different, and the names are different from the python file name (without extension) which is `sql_pull_data.py`.
This name is used in CodeInterpreter for code generation. So, you can see that in the `examples` field, h}(hj  hj   hKuhh	uh(hhh}h}(hj>  hj?  usub$8dc0d41f-408f-4082-aa07-13c13a2c03cfh
)}(h
}(hX   the function name is `sql_pull_data_from_A` and `sql_pull_data_from_B`, respectively.

Second, you can see that the `code` field is the same, and the code file name is `sql_pull_data.py`.
This means that the two plugins share the same implementation code. h}(hj  hj   hKuhh	uh(hhh}h}(hjG  hjH  usub$16b905f6-6213-4b44-9ae6-a47655724da8h
)}(h
}(hThe `code` field is optional, and if you don't specify it, the plugin name will be used as the code file name without the extension.

Third, you can see that the `configurations` field is different, and the `sqlite_db_path` is different.h}(hj  hj   hK	uhh	uh(hhh}h}(hjP  hjQ  usub$40d6de88-31ad-4afd-9c4b-1e83a4846e85h
)}(h
}(hX  This means that the two plugins have different configurations.
This is typically the key reason why you want to have multiple plugin configurations that share the same implementation.

Finally, you can see that the `description` field is different, and the descriptions are different.h}(hj  hj   hK
uhh	uh(hhh}h}(hjY  hjZ  usub$bffedb08-f7e8-4e7a-85ea-9d5d50f63d05h
)}(h
}(hX6  This is important because the Planner and the CodeInterpreter will use the description to make decisions and generate code.
The two descriptions should be explicit enough to distinguish the two plugins. 
Otherwise, the Planner and the CodeInterpreter may not be able to make the right decisions.

## Conclusionh}(hj  hj   hKuhh	uh(hhh}h}(hjb  hjc  usub$a85f0194-19d1-488b-bd9d-27e0851f228bh
)}(h
}(hWhen you meet the situation where you want to have multiple plugin configurations that share the same implementation,
you can use the `code` field to specify the code file name, and use the `configurations` field to specify the configurations.h}(hj  hj   hKuhh	uh(hhh}h}(hjk  hjl  usub$d6e036ef-8609-4d6e-9f58-9c6cf46a84c0h
)}(h
}(hThe `name` field is used to distinguish the plugins, and the `description` field is used to 
help the Planner and the CodeInterpreter make right decisions.h}(hj  hj   hK
uhh	uh(hhh}h}(hjt  hju  usub$b0b4c8c5-c1b6-475e-af9f-8a11043353f5h
)}(h
}(hX  ---
id: plugin_intro
description: Plugin introduction
slug: /plugin/plugin_intro
---

# Plugin Introduction

Plugins are the units that could be orchestrated by TaskWeaver. One could view the plugins as tools that the LLM can
utilize to accomplish certain tasks.
h}(h)docs\customization\plugin\plugin_intro.mdhid: plugin_introhK uhh	uh(hhh}h}(hj}  hj~  usub$82bca61e-911c-491e-9748-71b9c6770ad0h
)}(h
}(hIn TaskWeaver, each plugin is represented as a Python function that can be called within a code snippet. The
orchestration is essentially the process of generating Python code snippets consisting of a certain number of plugins.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$f4168823-f898-4547-b20c-d24390698850h
)}(h
}(hOne concrete example would be pulling data from database and apply anomaly detection. The generated code (simplified) looks like
follows:

```python
df, data_description = sql_pull_data(query="pull data from time_series table")  h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$ba59b610-4f1b-4953-b4cf-c4065d9f4e04h
)}(h
}(hX#  anomaly_df, anomaly_description = anomaly_detection(df, time_col_name="ts", value_col_name="val") 
```

## Plugin Structure

A plugin has two files:

* **Plugin Implementation**: a Python file that defines the plugin
* **Plugin Schema**: a file in yaml that defines the schema of the plugin
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$11ee8333-34f1-4c49-9dec-2e784bb8ba7bh
)}(h
}(hXW  ## Plugin Implementation

The plugin function needs to be implemented in Python.
To be coordinated with the orchestration by TaskWeaver, a plugin python file consists of two parts:

- Plugin function implementation code
- TaskWeaver plugin decorator

Here we exhibit an example of the anomaly detection plugin as the following code:

```pythonh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$4419b91d-adfb-4e5f-bd93-1f94e625029eh
)}(h
}(hX  import pandas as pd
from pandas.api.types import is_numeric_dtype

from taskweaver.plugin import Plugin, register_plugin

@register_plugin
class AnomalyDetectionPlugin(Plugin):
    def __call__(self, df: pd.DataFrame, time_col_name: str, value_col_name: str):

        """h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$5d658fb8-4b84-45b3-b4e8-86ae2806fb80h
)}(h
}(hX+          anomaly_detection function identifies anomalies from an input dataframe of time series.
        It will add a new column "Is_Anomaly", where each entry will be marked with "True" if the value is an anomaly
        or "False" otherwise.

        :param df: the input data, must be a dataframeh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e74b0bf6-c8c4-408c-af01-42c301d6917ch
)}(h
}(h        :param time_col_name: name of the column that contains the datetime
        :param value_col_name: name of the column that contains the numeric values.
        :return df: a new df that adds an additional "Is_Anomaly" column based on the input df.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$48448a5f-4a70-4784-b590-3b19c33e78e4h
)}(h
}(hXQ          :return description: the description about the anomaly detection results.
        """
        try:
            df[time_col_name] = pd.to_datetime(df[time_col_name])
        except Exception:
            print("Time column is not datetime")
            return

        if not is_numeric_dtype(df[value_col_name]):
            try:h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$8626a32d-c207-489c-96c9-f96766f5da7fh
)}(h
}(hX                  df[value_col_name] = df[value_col_name].astype(float)
            except ValueError:
                print("Value column is not numeric")
                return

        mean, std = df[value_col_name].mean(), df[value_col_name].std()
        cutoff = std * 3h}(hj  hj  hK	uhh	uh(hhh}h}(hj  hj  usub$bc6b2856-2b2c-4972-bec2-39060adbc1a9h
)}(h
}(h        lower, upper = mean - cutoff, mean + cutoff
        df["Is_Anomaly"] = df[value_col_name].apply(lambda x: x < lower or x > upper)
        anomaly_count = df["Is_Anomaly"].sum()h}(hj  hj  hK
uhh	uh(hhh}h}(hj  hj  usub$42671399-4719-494a-a343-aeb773d729d8h
)}(h
}(hX(          description = "There are {} anomalies in the time series data".format(anomaly_count)
        
        self.ctx.add_artifact(
             name="anomaly_detection_results",  # a brief description of the artifact
             file_name="anomaly_detection_results.csv",  # artifact file nameh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$033fe308-c2b8-4e4a-b96e-a4d2eeca6ce3h
)}(h
}(h             type="df",  # artifact data type, support chart/df/file/txt/svg
             val=df,  # variable to be dumped
        )
        
        return df, description

```

You need to go through the following steps to implement your own plugin.
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$77ef14e1-5deb-4025-8c9f-953d59157bd2h
)}(h
}(h1. import the TaskWeaver plugin decorator `from taskWeaver.plugin import Plugin, register_plugin`
2. create your plugin class inherited from `Plugin` parent class (e.g., `AnomalyDetectionPlugin(Plugin)`), which is
   decorated by `@register_plugin`h}(hj  hj  hK
uhh	uh(hhh}h}(hj  hj  usub$1a0cdbd9-daaf-438e-8ce2-f4852c333846h
)}(h
}(hX:  3. implement your plugin function in `__call__` method of the plugin class.  **Most importantly, it is mandatory to
   include `descriptions` of your execution results in the return values of your plugin function**. These descriptions
   can be utilized by the LLM to effectively summarize your execution results.
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$7366f8db-8a1c-4b82-bf84-0c905b87e00ch
)}(h
}(h> 💡A key difference in a plugin implementation and a normal python function is that it always return a description of
> the result in natural language. As LLMs only understand natural language, it is important to let the model understandh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$92382283-b8e8-4290-8c46-1f8445ce7ecbh
)}(h
}(hX$  > what the execution result is. In the example implementation above, the description says how many anomalies are detected.
> Behind the scene, only the description will be passed to the LLM model. In contrast, the execution result (e.g., df in
> the above example) is not handled by the LLM.
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$56f13510-6057-489f-ae66-58abfbd49a33h
)}(h
}(hX#  ### Important Notes

1. If the functionality of your plugin depends on additional libraries or packages, it is essential to ensure that they
   are installed before proceeding.

2. If you wish to persist intermediate results, such as data, figures, or prompts, in your plugin implementation,h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$2ea10623-b982-4002-81d9-8278e5dcb272h
)}(h
}(h   TaskWeaver provides an `add_artifact` API that allows you to store these results in the workspace. In the example we
   provide, if you have performed anomaly detection and obtained results in the form of a CSV file, you can utilizeh}(hj  hj  hKuhh	uh(hhh}h}(hj!  hj"  usub$f13b5b90-cb82-4d84-9c1d-3436f427bb41h
)}(h
}(hX     the `add_artifact` API to save this file as an artifact. The artifacts are stored in the `project/workspace/session_id/cwd` folder in the project directory.

```python
self.ctx.add_artifact(
    name="anomaly_detection_results",  # a brief description of the artifacth}(hj  hj  hKuhh	uh(hhh}h}(hj*  hj+  usub$f6c724b0-b04e-437b-8a2a-714389b859e8h
)}(h
}(h    file_name="anomaly_detection_results.csv",  # artifact file name
    type="df",  # artifact data type, support chart/df/file/txt/svg
    val=df,  # variable to be dumped
)
```

## Plugin Schema

The plugin schema is composed of several parts:
h}(hj  hj  hKuhh	uh(hhh}h}(hj3  hj4  usub$db758b3c-12b5-439f-9f47-d2eb34b089eeh
)}(h
}(h1. **name**: The main function name of the Python code.
2. **enabled**: determine whether the plugin is enabled for selection during conversations. The default value is true.
3. **descriptions**: A brief description that introduces the plugin function.h}(hj  hj  hKuhh	uh(hhh}h}(hj<  hj=  usub$d3246842-d0bb-45ac-9f4f-c4d91820ba41h
)}(h
}(h4. **parameters**: This section lists all the input parameter information. It includes the parameter's name, type,
   whether it is required or optional, and a description providing more details about the parameter.h}(hj  hj  hKuhh	uh(hhh}h}(hjE  hjF  usub$a619318c-5372-44f1-8a37-36c3f7108516h
)}(h
}(hX:  5. **returns**: This section lists all the return value information. It includes the return value's name, type, and
   description that provides information about the value that is returned by the function.

**Note:** The addition of any extra fields would result in a validation failure within the plugin schema.
h}(hj  hj  hKuhh	uh(hhh}h}(hjN  hjO  usub$5de7b1f2-1d1d-4e0f-818d-955b4bf42423h
)}(h
}(hX   The plugin schema is required to be written in YAML format. Here is the plugin schema example of the above anomaly
detection plugin:

```yaml
name: anomaly_detection
enabled: true
required: false
description: >-
  anomaly_detection function identifies anomalies from an input DataFrame ofh}(hj  hj  hKuhh	uh(hhh}h}(hjW  hjX  usub$73add8b4-608c-42eb-b5ec-710b1b8c651ah
)}(h
}(h  time series. It will add a new column "Is_Anomaly", where each entry will be marked with "True" if the value is an anomaly or "False" otherwise.

parameters:
  - name: df
    type: DataFrame
    required: true
    description: >-h}(hj  hj  hKuhh	uh(hhh}h}(hj`  hja  usub$b97d8ac0-c0f9-4565-b3ef-11099983c437h
)}(h
}(hX        the input data from which we can identify the anomalies with the 3-sigma
      algorithm.
  - name: time_col_name
    type: str
    required: true
    description: name of the column that contains the datetime
  - name: value_col_name
    type: str
    required: trueh}(hj  hj  hKuhh	uh(hhh}h}(hji  hjj  usub$28ffa3a8-37c0-4ae9-94c6-a6f04eab838eh
)}(h
}(hX/      description: name of the column that contains the numeric values.

returns:
  - name: df
    type: DataFrame
    description: >-
      This DataFrame extends the input DataFrame with a newly-added column
      "Is_Anomaly" containing the anomaly detection result.
  - name: description
    type: strh}(hj  hj  hKuhh	uh(hhh}h}(hjr  hjs  usub$d4a4510b-11cd-4b33-9a14-0a31a869ce38h
)}(h
}(h    description: This is a string describing the anomaly detection results.

```

Besides, we also set two optional fields as below:

1. **code**: Without specifying the code file name, the plugin schema will use the plugin name as the code file name.h}(hj  hj  hKuhh	uh(hhh}h}(hj{  hj|  usub$84649e70-480b-4de0-a0c6-dd6cdb8494a3h
)}(h
}(h   For example, the plugin name is `anomaly_detection` and the code file name is `anomaly_detection.py`.
   In cases where the plugin name is not the same as the code file name, you can specify the code name (code file) inh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$f8011500-833e-4c6a-9499-58a7f764023eh
)}(h
}(hX      the plugin schema to ensure clarity and accuracy. For example, the plugin name is `anomaly_detection` and the code
    file name is `anomaly_detection_code.py`. Then, you can specify the code name in the plugin schema as follows:
   ```yaml
   code: anomaly_detection_code
   ```h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$156ce136-9963-4c4d-8ebe-ab29e927306bh
)}(h
}(hX2     Note that the code file name should be the same as the code name without the `.py` extension. 
2. **configurations**: When using common code that requires some configuration parameter modifications for different
   plugins, it is important to specify these configuration parameters in the plugin schema.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$af2dec18-5c2c-4a87-a0c3-7654cec732fbh
)}(h
}(hX     The configuration parameters are specified in the plugin schema as follows:
   ```yaml
    configurations:
      key1: value1
      key2: value2
    ```
   These configuration parameters can be accessed in the plugin implementation as follows:
   ```python
   self.config.get("key1")h}(hj  hj  hK uhh	uh(hhh}h}(hj  hj  usub$474ea411-1479-4b5a-af8c-34b14f2e3bc2h
)}(h
}(h!   self.config.get("key2")
   ```h}(hj  hj  hK!uhh	uh(hhh}h}(hj  hj  usub$2dd82038-a2ad-4367-935e-213a8f73aa10h
)}(h
}(h---
id: plugin_only
description: The Plugin Only Mode
slug: /plugin/plugin_only
---

# The Plugin-Only Mode

## What is the plugin-only mode?
The plugin-only mode is a restricted mode of TaskWeaver that only allows you to use plugins. h}(h(docs\customization\plugin\plugin_only.mdhid: plugin_onlyhK uhh	uh(hhh}h}(hj  hj  usub$c665dcf8-5161-4f9a-a28a-7668c6d11a87h
)}(h
}(hX  Compared to the full mode, the plugin-only mode has the following restrictions:

1. The generated code only contains the calls to the plugins. 
   For example, the following code only calls the `ascii_render` plugin and does not contain any "free-form" code.
    ```pythonh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$df29edf3-5f73-4788-b0ca-2a7c19ffdc2dh
)}(h
}(h    r1=ascii_render(text="Hello")
    r1
    ```
    
2. Only the plugins with `plugin_only: true` in the yaml file will be loaded. 
   For example, the following plugin will be loaded in the plugin-only mode:
    ```yaml
    name: ascii_renderh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$1ef366ef-352d-46bb-a17d-826973b4dd55h
)}(h
}(h    code: ascii_render
    plugin_only: true
    ...
    ```
   If this field is not specified, the default value is `false`.
   For plugins in the plugin-only mode, the argument type can only be `str`, `int`, `boolean`, or `float`.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e09783bb-d381-4c75-9999-aacdf3f18eb4h
)}(h
}(hX
     Other types such as `DataFrame` are not allowed.
   Essentially, we consider these plugins only produce "text-like" output that can be directly consumed by the LLM.

To enable the plugin-only mode, you can add the configuration `"session.plugin_only_mode": true` h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e6c05014-7eee-4e1a-8734-813add7ac1bfh
)}(h
}(hX  in the project configuration file `taskweaver_config.json`.

## Why do we need the plugin-only mode?

Although the plugin-only mode is restricted, it is still useful in some scenarios.
For example, you may want to use TaskWeaver to only generate the code to call a certain plugin, h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$770fa234-d85c-4406-863e-de6d53e06e44h
)}(h
}(hX+  and you want to get the response from the plugin directly, without generating any other code 
for safety reasons.

## How is the plugin-only mode implemented?

The plugin-only mode is implemented based on the [function calling](https://platform.openai.com/docs/guides/function-calling) mode of LLMs.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$d88bdbb0-4577-4e54-8eb0-6a289a0d1425h
)}(h
}(hX  In this mode, the LLM is trained to generate a JSON object that contains the function name and the arguments.
For example, the following JSON object is generated by the LLM:
```json
{
  "function": "ascii_render",
  "arguments": {
    "text": "Hello"
  }
}
```h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$302b5d28-85a5-4975-aec3-996e2cecf760h
)}(h
}(hX  With this JSON object, we assemble the code to call the plugin:
```python
r1=ascii_render(text="Hello")
r1
```
Then, we execute the code and get the response from the plugin.
Therefore, the code is not directly generated by the LLM in the plugin-only mode.
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$ecfa66c5-aadc-4188-9c18-4ed3f3b1a3f9h
)}(h
}(h## Which models support the plugin-only mode?

Currently, the plugin-only mode is only supported by 

- [OpenAI models](https://platform.openai.com/docs/guides/function-calling)
- [ZhipuAI models](https://open.bigmodel.cn/dev/api)
h}(hj  hj  hK	uhh	uh(hhh}h}(hj  hj  usub$58b2f656-4437-455d-b131-0c8946b8bcach
)}(h
}(hfLikely other models that are compatible with the OpenAI models will also support the plugin-only mode.h}(hj  hj  hK
uhh	uh(hhh}h}(hj
  hj  usub$b6a38219-61c9-411d-afe1-d9549cc15a61h
)}(h
}(hX;  # Auto Plugin Selection 

In TaskWeaver, we provide an auto plugin selection mechanism to dynamically select the best plugin for each user request.
It targets to solve the following two problems:

1. An excessive number of plugins may cause confusion for LLM, leading to inaccuracies in generating the correct code.h}(h-docs\customization\plugin\plugin_selection.mdh# Auto Plugin SelectionhK uhh	uh(hhh}h}(hj  hj  usub$3d247bea-357a-416f-bf89-1c888019f303h
)}(h
}(hX6  2. A large number of plugins could lead to increased token usage (potentially exceeding the token limit of LLM) and extended response time.

## Auto Plugin Selection Overview

Below is the overview workflow of the auto plugin selection mechanism.
![Auto Plugin Selection Overview](../../../static/img/APS.png)
h}(hj  hj  hKuhh	uh(hhh}h}(hj!  hj"  usub$a461bce1-6518-421e-91f4-3747b24e99e0h
)}(h
}(hXc  NOTE: the automatic plugin selection mechanism is only activated during the code generation process in the Code Interpreter and does not affect the planning process of the Planner.

At the start of TaskWeaver initialization, the automatic plugin selector is activated to generate embedding vectors for all plugins, including their names and descriptions. h}(hj  hj  hKuhh	uh(hhh}h}(hj*  hj+  usub$e142aa3a-056d-4690-9c3d-8e75744c3fdbh
)}(h
}(hX<  The embedding vectors are created using the specified embedding model configured in the `taskweaver_config.json` file. 
For more information, please refer to the [embedding](embedding.md) documentation.

When the Planner sends a request to the Code Interpreter, the auto plugin selection mechanism will be triggered.h}(hj  hj  hKuhh	uh(hhh}h}(hj3  hj4  usub$3ad96516-3e8d-4fd7-be3c-871f47d25b5bh
)}(h
}(hXT  It will first generate an embedding vector for the request using the same embedding model.
Then, it will calculate the cosine similarity between the request embedding vector and the embedding vectors of all plugins.
It will select the top-k plugins with the highest cosine similarity scores and  load them into the `code_generator` prompt.
h}(hj  hj  hKuhh	uh(hhh}h}(hj<  hj=  usub$a43a01b8-59ff-4fb9-9709-80082fee58f2h
)}(h
}(hUpon completing the code generation, the `code_generator` employs one or more plugins to produce the desired code. 
We have established a plugin pool to store the plugins involved in the code generation process while filtering out any unused ones. h}(hj  hj  hKuhh	uh(hhh}h}(hjE  hjF  usub$4926dbd2-329e-4f72-8199-893c0767da8dh
)}(h
}(hX  During the subsequent automatic plugin selection phase, newly chosen plugins are appended to the existing ones. 

## Auto Plugin Selection Configuration
- `code_generator.enable_auto_plugin_selection`: Whether to enable auto plugin selection. The default value is `false`.h}(hj  hj  hKuhh	uh(hhh}h}(hjN  hjO  usub$bae3224b-1963-4e46-91f7-d95db5bc315eh
)}(h
}(hX@  - `code_generator.auto_plugin_selection_topk`:	The number of auto selected plugins in each round. The default value is `3`.

## Auto Plugin Selection Preparation

Before using the auto plugin selection mechanism, we need to run the following command to generate the plugin meta files with embeddings.

```bash
cd scriptsh}(hj  hj  hKuhh	uh(hhh}h}(hjW  hjX  usub$734613a2-17e9-4634-b56d-863bb39a409dh
)}(h
}(hX$  python -m plugin_mgt --refresh
```
After that, you can find the `.meta` directory is generated in the `plugins` folder.
Then you can start a new TaskWeaver session with the auto plugin selection mechanism enabled.
Code Generator will automatically load the plugin meta files with embeddings.
h}(hj  hj  hKuhh	uh(hhh}h}(hj`  hja  usub$0ca9d914-710c-467b-b6af-d20e33c01886h
)}(h
}(h🎈Plugin meta files will be treated as invalid if:
  - The plugin embedding vector is not generated.
  - The plugin is modified.
  - The plugin embedding model is changed.
h}(hj  hj  hK	uhh	uh(hhh}h}(hji  hjj  usub$f13f89c0-b754-4ed2-9e4e-55e4024cef39h
)}(h
}(hXC  In this case, you cannot start the TaskWeaver and you need to run the above command again to refresh the plugin meta files.

```bash

## Auto Plugin Selection Example

We show the auto plugin selection mechanism in the following example.

First, we start TaskWeaver with the auto plugin selection mechanism enabled.
```bashh}(hj  hj  hK
uhh	uh(hhh}h}(hjr  hjs  usub$1e71d803-d063-4e1c-9de7-62514b4175ddh
)}(h
}(h=========================================================
 ___         _     _       __
|_   _|_ _ ___| | _ | |     / /__  ___ __   ___  ___
  | |/ _` / __| |/ /| | /| / / _ \/ __ `/ | / / _ \/ ___/h}(hj  hj  hKuhh	uh(hhh}h}(hj{  hj|  usub$7893f609-0d5c-4120-b902-dad8cdc18fddh
)}(h
}(h  | | (_| \__ \   < | |/ |/ /  __/ /_/ /| |/ /  __/ /
  |_|\__,_|___/_|\_\|__/|__/\___/\__,_/ |___/\___/_/
=========================================================h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$77c06cb3-821a-47b1-a0cb-157689a40365h
)}(h
}(hoTaskWeaver: I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
Human: 
```
h}(hj  hj  hK
uhh	uh(hhh}h}(hj  hj  usub$280c42f5-8e36-4295-b500-24b840f27581h
)}(h
}(hX	  Then we can check the log file `task_weaver.log` in the `logs` folder to see the auto plugin selector is initialized successfully because the `Plugin embeddings generated` message is printed.
```bash
2023-12-18 14:23:44,197 - INFO - Planner initialized successfullyh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$daa9dc26-4719-40a4-a136-091313af141ch
)}(h
}(h2023-12-18 14:24:10,488 - INFO - Plugin embeddings generated
2023-12-18 14:24:10,490 - INFO - CodeInterpreter initialized successfully.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$83d6a479-ce7f-4dbc-8f47-1dc594a08bcdh
)}(h
}(h2023-12-18 14:24:10,490 - INFO - Session 20231218-062343-c18494b1 is initialized
```
We ask TaskWeaver to "search Xbox price for me".
The Planner instructs the Code Interpreter to search Xbox price.

```bashh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$1e51d619-a68d-4b51-9536-ff2a9dabbdebh
)}(h
}(hTaskWeaver: I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
Human: search xbox price for me
>>> [INIT_PLAN]
1. search xbox price
2. report the result to the user <interactively depends on 1>
>>> [PLAN]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$6f84fbe1-b921-423e-81a5-5769573e6197h
)}(h
}(hX  1. instruct CodeInterpreter to search xbox price
2. report the result to the user
>>> [CURRENT_PLAN_STEP]
1. instruct CodeInterpreter to search xbox price
>>> [SEND_TO]
CodeInterpreter
>>> [MESSAGE]
Please search xbox price
>>> [PLANNER->CODEINTERPRETER]
Please search xbox price
```
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$591cfbf8-397a-4614-884a-b7c0a2c22b99h
)}(h
}(hBack to the Code Interpreter, the auto plugin selection mechanism is triggered.
We can check the log file `task_weaver.log` again to see the auto plugin selector selected the top-3 plugins with the highest cosine similarity scores.
```bashh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$c374d938-6ff9-48b6-8b08-109934d280dbh
)}(h
}(ht023-12-18 14:24:34,513 - INFO - Planner talk to CodeInterpreter: Please search xbox price using klarna_search pluginh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$c5808c98-e963-43db-9026-fbca55a49788h
)}(h
}(hf2023-12-18 14:24:34,669 - INFO - Selected plugins: ['klarna_search', 'sql_pull_data', 'paper_summary']h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$4fc402ce-9384-4fa1-a32e-ac8b67865cd3h
)}(h
}(h2023-12-18 14:24:34,669 - INFO - Selected plugin pool: ['klarna_search', 'sql_pull_data', 'paper_summary']
```

Then the Code Interpreter will generate the code using the selected plugins.
````bash
>>> [THOUGHT]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$733ef8e0-5176-4a34-b5d3-8c1724e54046h
)}(h
}(hX.  ProgramApe will call the klarna_search plugin function to search for Xbox prices.
>>> [PYTHON]
search_results, description = klarna_search(query="xbox")
search_results, description
>>> [VERIFICATION]
NONE
>>> [STATUS]
SUCCESS
>>> [RESULT]
The execution of the generated python code above has succeeded
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$fce622e9-d1db-4940-9708-be40abf864e1h
)}(h
}(hThe result of above Python code after execution is:
(                                                 name    price                                                url                                         attributesh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$160e5040-4522-4dd2-bbc0-929d69a9086ah
)}(h
}(h 0             Microsoft Xbox Series X - Black Edition  $399.00  https://www.klarna.com/us/shopping/pl/cl52/495...  [Release Year:2020, Included Accessories:1 gam...h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$a5e7fea8-df3a-489d-96b9-8fa7c0906531h
)}(h
}(hXK   1                 Microsoft Xbox Series S 1TB - Black  $349.00  https://www.klarna.com/us/shopping/pl/cl52/320...  [Included Accessories:1 gamepad, Media Type:DV...
 ..                                                ...      ...                                                ...                                                ...h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$bc39f9c0-c54e-4c80-8855-a19c6689e6b4h
)}(h
}(h 3                      Xbox Series S – Starter Bundle  $239.00  https://www.klarna.com/us/shopping/pl/cl52/320...                                [Platform:Xbox One]h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$ad53b533-a645-43a3-8481-2683ba9dbe20h
)}(h
}(h 4   Microsoft Xbox Series X 1TB Console - Diablo I...  $385.58  https://www.klarna.com/us/shopping/pl/cl52/320...  [Release Year:2023, Included Accessories:1 gam...

 [5 rows x 4 columns],h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$bd179743-3c37-4efe-8afc-9fb9f702be39h
)}(h
}(h 'The response is a dataframe with the following columns: name, price, url, attributes. The attributes column is a list of tags. The price is in the format of $xx.xx.')
>>> [CODEINTERPRETER->PLANNER]
The following python code has been executed:
```pythonh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$1364c9dd-d625-4ead-bdab-4c068deede95h
)}(h
}(hXt  search_results, description = klarna_search(query="xbox")
search_results, description
```

The execution of the generated python code above has succeeded

The result of above Python code after execution is:
(                                                 name    price                                                url                                         attributesh}(hj  hj  hKuhh	uh(hhh}h}(hj&  hj'  usub$0219d81c-0621-4b19-b17a-c53810ea92f1h
)}(h
}(h 0             Microsoft Xbox Series X - Black Edition  $399.00  https://www.klarna.com/us/shopping/pl/cl52/495...  [Release Year:2020, Included Accessories:1 gam...h}(hj  hj  hKuhh	uh(hhh}h}(hj/  hj0  usub$fab605e2-fac4-49b1-a883-a558df528fcah
)}(h
}(hXK   1                 Microsoft Xbox Series S 1TB - Black  $349.00  https://www.klarna.com/us/shopping/pl/cl52/320...  [Included Accessories:1 gamepad, Media Type:DV...
 ..                                                ...      ...                                                ...                                                ...h}(hj  hj  hK uhh	uh(hhh}h}(hj8  hj9  usub$8d83a65a-dd89-4272-9bef-d58c6f051888h
)}(h
}(h 3                      Xbox Series S – Starter Bundle  $239.00  https://www.klarna.com/us/shopping/pl/cl52/320...                                [Platform:Xbox One]h}(hj  hj  hK!uhh	uh(hhh}h}(hjA  hjB  usub$94f87dd6-a1b4-407e-8c26-1c00fbbfae0ah
)}(h
}(h 4   Microsoft Xbox Series X 1TB Console - Diablo I...  $385.58  https://www.klarna.com/us/shopping/pl/cl52/320...  [Release Year:2023, Included Accessories:1 gam...

 [5 rows x 4 columns],h}(hj  hj  hK"uhh	uh(hhh}h}(hjJ  hjK  usub$d14bcf62-6cd3-4250-ad6a-2e1188528c53h
)}(h
}(hX    'The response is a dataframe with the following columns: name, price, url, attributes. The attributes column is a list of tags. The price is in the format of $xx.xx.')
```` 

Finally, the Planner will report the result to the user.
```bash
>>> [INIT_PLAN]h}(hj  hj  hK#uhh	uh(hhh}h}(hjS  hjT  usub$ac3a0218-b62a-4fab-8429-40342b71516ch
)}(h
}(h1. search xbox price using klarna_search plugin
2. report the result to the user <interactively depends on 1>
>>> [PLAN]
1. instruct CodeInterpreter to search xbox price using klarna_search plugin
2. report the result to the user
>>> [CURRENT_PLAN_STEP]h}(hj  hj  hK$uhh	uh(hhh}h}(hj\  hj]  usub$e9005ba0-d9fb-4ca4-8338-76569d355665h
)}(h
}(h2. report the result to the user
>>> [SEND_TO]
User
>>> [MESSAGE]
Here are some Xbox prices I found:

1. Microsoft Xbox Series X - Black Edition: $399.00
2. Microsoft Xbox Series S 1TB - Black: $349.00h}(hj  hj  hK%uhh	uh(hhh}h}(hje  hjf  usub$6555b7c6-e54d-4c54-b8c4-fa8720d5b44dh
)}(h
}(h3. Xbox Series S – Starter Bundle: $239.00
4. Microsoft Xbox Series X 1TB Console - Diablo Immortal Edition: $385.58

Please note that these prices are subject to change and availability.
>>> [PLANNER->USER]
Here are some Xbox prices I found:
h}(hj  hj  hK&uhh	uh(hhh}h}(hjn  hjo  usub$a7ddbd50-99f3-4c7e-bcad-899e7fa38998h
)}(h
}(h1. Microsoft Xbox Series X - Black Edition: $399.00
2. Microsoft Xbox Series S 1TB - Black: $349.00
3. Xbox Series S – Starter Bundle: $239.00
4. Microsoft Xbox Series X 1TB Console - Diablo Immortal Edition: $385.58
h}(hj  hj  hK'uhh	uh(hhh}h}(hjw  hjx  usub$5131b032-baf5-4a40-bd36-6b3f7f7d08f7h
)}(h
}(hPlease note that these prices are subject to change and availability.
TaskWeaver: Here are some Xbox prices I found:

1. Microsoft Xbox Series X - Black Edition: $399.00
2. Microsoft Xbox Series S 1TB - Black: $349.00h}(hj  hj  hK(uhh	uh(hhh}h}(hj  hj  usub$4dc3d2e8-ad8e-4ab4-ac8f-7c1f9594b193h
)}(h
}(h3. Xbox Series S – Starter Bundle: $239.00
4. Microsoft Xbox Series X 1TB Console - Diablo Immortal Edition: $385.58

Please note that these prices are subject to change and availability.
```h}(hj  hj  hK)uhh	uh(hhh}h}(hj  hj  usub$2ce46eb8-8cfd-41eb-9689-39cfe66698ech
)}(h
}(h---
description: Using LLMs from OpenAI/AOAI
---
# Azure OpenAI

1. Create an account on [Azure OpenAI](https://azure.microsoft.com/en-us/products/ai-services/openai-service) and get your API key.h}(hdocs\llms\aoai.mdh(description: Using LLMs from OpenAI/AOAIhK uhh	uh(hhh}h}(hj  hj  usub$473ce16b-bb91-4c32-a679-40856274e269h
)}(h
}(h2. Add the following to your `taskweaver_config.json` file:
```json showLineNumbers
{
  "llm.api_base":"YOUR_AOAI_ENDPOINT",
  "llm.api_key":"YOUR_API_KEY",
  "llm.api_type":"azure",h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$2ad93914-f931-40da-932a-26bf73c3ac63h
)}(h
}(h  "llm.auth_mode":"api-key",
  "llm.model":"gpt-4-1106-preview",
  "llm.response_format": "json_object"
}
```

:::tip
`llm.model` is the model name you want to use.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$5ba2588f-8fa0-4771-a5d9-05be2b6db11ah
)}(h
}(hYou can find the list of models [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models).
:::

:::infoh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$d9803911-77fc-42de-ab67-eda915c0765fh
)}(h
}(heFor `gpt-4-1106-preview` and `gpt-3.5-turbo-1106`, `llm.response_format` can be set to `json_object`.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$8f9a3214-31c6-4642-a2e9-823a631eb622h
)}(h
}(hHowever, for the earlier models, which do not support JSON response explicitly, `llm.response_format` should be set to `null`.
:::

3. Start TaskWeaver and chat with TaskWeaver.
You can refer to the [Quick Start](../quickstart.md) for more details.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$93f7cef2-ef5c-4f28-bc71-2363e9eba642h
)}(h
}(h# Customized LLM API

We welcome developers to use your customized LLM API in TaskWeaver. 
In this tutorial, we will show you how to contribute your LLM API to TaskWeaver.
h}(hdocs\llms\customized_llm_api.mdh# Customized LLM APIhK uhh	uh(hhh}h}(hj  hj  usub$679d7f05-b518-4059-9435-f7613c719a2fh
)}(h
}(h1. Create a new Python script `<your_LLM_name>.py` in the `taskweaver/llm` folder. 
2. Import the `CompletionService`, `LLMServiceConfig` and `EmbeddingService` from `taskweaver.llm.base` and other necessary libraries.
```pythonh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$7b6a486d-dc7d-44ec-9b29-5050d336952ah
)}(h
}(hfrom injector import inject
from taskweaver.llm.base import CompletionService, EmbeddingService, LLMServiceConfig
from taskweaver.llm.util import ChatMessageType
...
```h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$8df66c21-c60a-4946-b989-7d7eb4fb7d22h
)}(h
}(h3. Create a new class `YourLLMServiceConfig` that inherits from `LLMServiceConfig` and implements the `_configure` method.
In this method, you can set the name, API key, model name, backup model name, and embedding model name of your LLM.
```pythonh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$3743d197-9b58-4a8c-8a35-b6e7e69432d5h
)}(h
}(hX	  class YourLLMServiceConfig(LLMServiceConfig):
    def _configure(self) -> None:
        self._set_name("your_llm_name")

        shared_api_key = self.llm_module_config.api_key
        self.api_key = self._get_str(
            "api_key",
            shared_api_key,h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$ea0f5160-2c16-4ee6-9a04-460f510e462dh
)}(h
}(hX          )

        shared_model = self.llm_module_config.model
        self.model = self._get_str(
            "model",
            shared_model if shared_model is not None else "your_llm_model_name",
        )

        shared_backup_model = self.llm_module_config.backup_modelh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$c505a3e4-fccd-43ce-ad4d-ded4560666d8h
)}(h
}(hX:          self.backup_model = self._get_str(
            "backup_model",
            shared_backup_model if shared_backup_model is not None else self.model,
        )

        shared_embedding_model = self.llm_module_config.embedding_model
        self.embedding_model = self._get_str(
            "embedding_model",h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$37aec0bc-7ccf-4a96-a960-7800b6e3bfe1h
)}(h
}(hX              shared_embedding_model if shared_embedding_model is not None else self.model,
        )
```
4. Create a new class `YourLLMService` that inherits from `CompletionService` and `EmbeddingService` and implements the `chat_completion` and `get_embeddings` methods.
```pythonh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$5762d936-709e-4d5b-b7be-4d07f1ff033eh
)}(h
}(hX   class YourLLMService(CompletionService, EmbeddingService):
    @inject
    def __init__(self, config: YourLLMServiceConfig):
        self.config = config
        pass

    def chat_completion(
        self,
        messages: List[ChatMessageType],
        use_backup_engine: bool = False,h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$a697b2ec-b4ff-4712-83f9-d950885dbec1h
)}(h
}(hX          stream: bool = True,
        temperature: Optional[float] = None,
        max_tokens: Optional[int] = None,
        top_p: Optional[float] = None,
        stop: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> Generator[ChatMessageType, None, None]:h}(hj  hj  hK	uhh	uh(hhh}h}(hj  hj  usub$b4557693-01ff-4df4-bf04-1e2a6ce57910h
)}(h
}(h        pass
    
    def get_embeddings(self, strings: List[str]) -> List[List[float]]:
        pass
```
Note:
- We set stream mode by default in `chat_completion`.h}(hj  hj  hK
uhh	uh(hhh}h}(hj&  hj'  usub$13229e67-2a9d-465e-bd05-b1025a7be58fh
)}(h
}(h- You need to use `self.config` to get the configuration variables of your LLM API (e.g., api key/model name) in `YourLLMService` class.
- The `get_embeddings` method is optional. h}(hj  hj  hKuhh	uh(hhh}h}(hj/  hj0  usub$b5761367-bc2e-4585-bdad-5a1ae72199a8h
)}(h
}(h~- If you need to import other libraries for your LLM API, please import them in `__init__` function of `YourLLMService` class.h}(hj  hj  hKuhh	uh(hhh}h}(hj8  hj9  usub$19e0ad0b-636f-418a-bf31-620548161520h
)}(h
}(hYou can refer to [QWen dashscope library import](https://github.com/microsoft/TaskWeaver/blob/main/taskweaver/llm/qwen.py) for an example.
h}(hj  hj  hK
uhh	uh(hhh}h}(hjA  hjB  usub$a086b8c4-cbaa-481b-8b51-448b9a5feea0h
)}(h
}(h5. Register your LLM service in `taskweaver/llm/__init__.py` by adding your LLM service to the `LLMApi` `__init__` function .
```python 
......
from .your_llm_name import YourLLMService # import your LLM service here
h}(hj  hj  hKuhh	uh(hhh}h}(hjJ  hjK  usub$d1a6f219-69b5-4ba1-9804-7adeb90bc782h
)}(h
}(hclass LLMApi(object):
    @inject
    def __init__(self, config: LLMModuleConfig, injector: Injector) -> None:
        self.config = config
        self.injector = injector

        if self.config.api_type in ["openai", "azure", "azure_ad"]:h}(hj  hj  hKuhh	uh(hhh}h}(hjS  hjT  usub$d8646fea-c5b6-4041-903a-01453fd445ffh
)}(h
}(h            self._set_completion_service(OpenAIService)
            
        ......
        
        elif self.config.api_type == "your_llm_name":
            self._set_completion_service(YourLLMService) # register your LLM service here
        else:h}(hj  hj  hKuhh	uh(hhh}h}(hj\  hj]  usub$aafe9f59-d5bd-4492-8fe8-d51d82fec1dah
)}(h
}(hX7              raise ValueError(f"API type {self.config.api_type} is not supported")

        if self.config.embedding_api_type in ["openai", "azure", "azure_ad"]:
            self._set_embedding_service(OpenAIService)
            
        ......
        
        elif self.config.embedding_api_type == "azure_ml":h}(hj  hj  hKuhh	uh(hhh}h}(hje  hjf  usub$e3e8a4d5-45db-4a73-abcd-31cfe0f32339h
)}(h
}(hXO              self.embedding_service = PlaceholderEmbeddingService(
                "Azure ML does not support embeddings yet. Please configure a different embedding API.",
            )
        # register your embedding service here, if do not have embedding service, please use `PlaceholderEmbeddingService` referring to the above lineh}(hj  hj  hKuhh	uh(hhh}h}(hjn  hjo  usub$54ed4a57-46c5-4879-91ab-5244e42963c1h
)}(h
}(hX          elif self.config.embedding_api_type == "your_llm_name": 
            self._set_embedding_service(YourLLMService)
        else:
            raise ValueError(
                f"Embedding API type {self.config.embedding_api_type} is not supported",
            )
```
h}(hj  hj  hKuhh	uh(hhh}h}(hjw  hjx  usub$97d84fd0-38d5-4173-aedf-3b9cf0b294edh
)}(h
}(h5. Configurate `taskweaver_config.json` file in the `project` dir based on your implemented LLM API.
6. Run the following command to test your LLM API. If the LLM API is successfully set up, you will see the response from your LLM API.
```bashh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$5d2069b1-50bd-4f77-8a9e-7ee1b5217a18h
)}(h
}(hcd ./scripts
python llm_api_test.py
```
You also can specify the project dir and query to be sent to your LLM API by using the following command:
```bash
python llm_api_test.py --project <your_project_path> --query "hello, what can you do?" 
```h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$b1aaedf9-b88b-40cc-b2e5-a34e00ab9263h
)}(h
}(h# Gemini

1. Create an account on [Google AI](https://ai.google.dev/) and get your API key.
2. Add the following content to your `taskweaver_config.json` file:
```json showLineNumbers
{
"llm.api_type": "google_genai",h}(hdocs\llms\gemini.mdh# GeminihK uhh	uh(hhh}h}(hj  hj  usub$50454d65-ce03-4193-b3f0-138df0a64d4fh
)}(h
}(h"llm.google_genai.api_key": "YOUR_API_KEY",
"llm.google_genai.model": "gemini-pro"
}
```

3. Start TaskWeaver and chat with TaskWeaver.
You can refer to the [Quick Start](../quickstart.md) for more details.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$e2fcd841-67e3-4291-ab03-d46070df0564h
)}(h
}(h# GLM
h}(hdocs\llms\glm.mdh# GLMhK uhh	uh(hhh}h}(hj  hj  usub$1f03d3c0-426c-47a3-949d-2a1bef01003ch
)}(h
}(h1. GLM (ChatGLM) is a LLM developed by Zhipu AI and Tsinghua KEG. Go to [ZhipuAI](https://open.bigmodel.cn/) and register an account and get the API key. More details can be found [here](https://open.bigmodel.cn/overview).h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$1c32cc40-e361-4e42-9c0d-2bc54e36385ah
)}(h
}(h2. Install the required packages dashscope.
```bash
pip install zhipuai
```
3. Add the following configuration to `taskweaver_config.json`:
```json showLineNumbers
{
  "llm.api_type": "zhipuai",
  "llm.model": "glm-4",h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$c6c12d4e-2a79-465e-ad70-33348fbd76f5h
)}(h
}(h  "llm.embedding_model": "embedding-2",
  "llm.embedding_api_type": "zhipuai",
  "llm.api_key": "YOUR_API_KEY"
}
```
NOTE: `llm.model` is the model name of zhipuai  API. h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$f9701d47-7000-4f1b-9a26-664ceeb908f5h
)}(h
}(hYou can find the model name in the [GLM model list](https://open.bigmodel.cn/dev/api#language).

4. Start TaskWeaver and chat with TaskWeaver. 
You can refer to the [Quick Start](../quickstart.md) for more details.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$a4fe2bbd-37b2-41b6-a62f-95c93269f7fbh
)}(h
}(h---
description: List for all supported LLMs 
---
# Supported LLMs 

```mdx-code-block
import DocCardList from '@theme/DocCardList';

<DocCardList />
```h}(hdocs\llms\index.mdh(description: List for all supported LLMshK uhh	uh(hhh}h}(hj  hj  usub$c5523cc7-cc74-4ad2-a51b-4ceb6cf6d919h
)}(h
}(h@---
description: Using LLMs from LiteLLM
---

# LiteLLM

:::infoh}(hdocs\llms\liteLLM.mdh$description: Using LLMs from LiteLLMhK uhh	uh(hhh}h}(hj  hj  usub$4431b6f4-83fb-49ea-9ed7-56d5c8e1339ah
)}(h
}(hXe  [LiteLLM](https://docs.litellm.ai/) provides a unified interface to call 100+ LLMs using the same Input/Output format, including OpenAI, Huggingface, Anthropic, vLLM, Cohere, and even custom LLM API server. Taking LiteLLM as the bridge, many LLMs can be onboarded to TaskWeaver. Here we use the OpenAI Proxy Server provided by LiteLLM to make configuration.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$eb45448f-d3a7-4e5c-abb0-11e43c6d2d97h
)}(h
}(h:::

1. Install LiteLLM Proxy and configure the LLM server by following the instruction [here](https://docs.litellm.ai/docs/proxy/quick_start). In general, there are a few steps:
    1. Install the package `pip install litellm[proxy]`h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$6795b9ce-0b75-4118-8e3c-f9226f6d0d89h
)}(h
}(h    2. Setup the API key and other necessary environment variables which vary by LLM. Taking [Cohere](https://cohere.com/) as an example, it is required to setup `export COHERE_API_KEY=my-api-key`.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$9eb2eae6-7afb-4174-91c2-6da666c77d89h
)}(h
}(hX      3. Run LiteLLM proxy server by `litellm --model MODEL_NAME --drop_params`, for example, in Cohere, the model name can be `command-nightly`. The `drop-params` argument is used to ensure the API compatibility. Then, a server will be automatically started on `http://0.0.0.0:8000`.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$a419e9c5-bba6-4ce6-98b7-c2a35dd9fab7h
)}(h
}(h
:::tip
The full list of supported models by LiteLLM can be found in the [page](https://docs.litellm.ai/docs/providers).
:::

2. Add the following content to your `taskweaver_config.json` file:

```json showLineNumbers
{h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$3e38d788-c72f-4d23-be9a-b3e2c590a491h
)}(h
}(hs  "llm.api_base": "http://0.0.0.0:8000",
  "llm.api_key": "anything",
  "llm.model": "gpt-3.5-turbo"
}
```

:::infoh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$0d93b6e4-1283-4d10-ad44-352affe6146ah
)}(h
}(hX  `llm.api_key` and `llm.model` are mainly used as placeholder for API call, whose actual values are not used. If the configuration does not work, please refer to LiteLLM [documents](https://docs.litellm.ai/docs/proxy/quick_start) to locally test whether you can send requests to the LLM. h}(hj  hj  hKuhh	uh(hhh}h}(hj!  hj"  usub$4472551c-75fb-4478-b06b-832f6f01c7f9h
)}(h
}(h~:::

3. Open a new terminal, start TaskWeaver and chat.
You can refer to the [Quick Start](../quickstart.md) for more details.h}(hj  hj  hKuhh	uh(hhh}h}(hj*  hj+  usub$3af52d85-37a9-4dda-bb77-ef17a0ada8e3h
)}(h
}(h# Ollama

1. Go to [Ollama](https://github.com/jmorganca/ollama) and follow the instructions to serve a LLM model on your local environment.h}(hdocs\llms\ollama.mdh# OllamahK uhh	uh(hhh}h}(hj3  hj4  usub$5f311361-81f6-42c7-80ed-2b6ff679efc2h
)}(h
}(hWe provide a short example to show how to configure the ollama in the following, which might change if ollama makes updates.

```bash title="install ollama and serve LLMs in local" showLineNumbers
## Install ollama on Linux & WSL2h}(hj5  hj6  hKuhh	uh(hhh}h}(hj>  hj?  usub$4aa1c596-0216-4f4c-871f-c450343210f8h
)}(h
}(hcurl https://ollama.ai/install.sh | sh
## Run the serving
ollama serve
```
Open another terminal and run:
```bash
ollama run llama2:13b
```
:::tiph}(hj5  hj6  hKuhh	uh(hhh}h}(hjG  hjH  usub$b3785af4-82c4-4149-97f8-57641a25b6c1h
)}(h
}(hWe recommend deploying the LLM with a parameter scale exceeding 13B for enhanced performance (such as Llama 2 13B).
:::

:::infoh}(hj5  hj6  hKuhh	uh(hhh}h}(hjP  hjQ  usub$92f36ddc-c511-407b-95d1-033f9fb6bac2h
)}(h
}(hWhen serving LLMs via Ollama, it will by default start a server at `http://localhost:11434`, which will later be used as the API base in `taskweaver_config.json`.
:::

2. Add following configuration to `taskweaver_config.json`:
```json showLineNumbers
{h}(hj5  hj6  hKuhh	uh(hhh}h}(hjY  hjZ  usub$f21eb2d0-e50d-4a60-97d0-46bb67b95e88h
)}(h
}(h    "llm.api_base": "http://localhost:11434",
    "llm.api_key": "ARBITRARY_STRING",
    "llm.api_type": "ollama",
    "llm.model": "llama2:13b"
}
```h}(hj5  hj6  hKuhh	uh(hhh}h}(hjb  hjc  usub$8cc81b61-c023-4aaa-8ea5-28f7b1bc0260h
)}(h
}(hNOTE: `llm.api_base` is the URL started in the Ollama LLM server and `llm.model` is the model name of Ollama LLM, it should be same as the one you served before. 

3. Start TaskWeaver and chat with TaskWeaver. h}(hj5  hj6  hKuhh	uh(hhh}h}(hjk  hjl  usub$40a7839c-1c4e-46da-a213-1947ffa6efeeh
)}(h
}(hFYou can refer to the [Quick Start](../quickstart.md) for more details.h}(hj5  hj6  hKuhh	uh(hhh}h}(hjt  hju  usub$81fce896-3f11-496d-b1ee-a0c428a53e21h
)}(h
}(h---
description: Using LLMs from OpenAI
---
# OpenAI

1. Create an account on [OpenAI](https://beta.openai.com/) and get your [API key](https://platform.openai.com/api-keys).
2. Add the following to your `taskweaver_config.json` file:h}(hdocs\llms\openai.mdh#description: Using LLMs from OpenAIhK uhh	uh(hhh}h}(hj}  hj~  usub$04ea7a9b-c3f1-4766-87c1-6ee7334e0111h
)}(h
}(h```json showLineNumbers
{
  "llm.api_type":"openai",
  "llm.api_base": "https://api.openai.com/v1",
  "llm.api_key": "YOUR_API_KEY",
  "llm.model": "gpt-4-1106-preview",h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$de4ce216-2e32-4a36-94b4-3d16691de20ch
)}(h
}(h  "llm.response_format": "json_object"
}
```
:::tip
`llm.model` is the model name you want to use.
You can find the list of models [here](https://platform.openai.com/docs/models).
:::

:::infoh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$2aa0062b-92d1-4fcd-a7eb-669c862820b2f8      h
)}(h
}(heFor `gpt-4-1106-preview` and `gpt-3.5-turbo-1106`, `llm.response_format` can be set to `json_object`.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$26ac0558-83a5-488e-acfc-33cd2ee2d83fh
)}(h
}(hHowever, for the earlier models which do not support JSON response explicitly, `llm.response_format` should be set to `null`.
:::
3. Start TaskWeaver and chat with TaskWeaver.
You can refer to the [Quick Start](../quickstart.md) for more details.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$8399711d-6e2c-452d-b0c8-48613b31819fh
)}(h
}(h# QWen
h}(hdocs\llms\qwen.mdh# QWenhK uhh	uh(hhh}h}(hj  hj  usub$566aede3-51d8-47cb-992e-e15bd868ae51h
)}(h
}(hXH  1. QWen (Tongyi Qianwen) is a LLM developed by Alibaba. Go to [QWen](https://dashscope.aliyun.com/) and register an account and get the API key. More details can be found [here](https://help.aliyun.com/zh/dashscope/developer-reference/activate-dashscope-and-create-an-api-key?spm=a2c4g.11186623.0.0.7b5749d72j3SYU) (in Chinese).h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$d6ff1c36-d7a8-4d9b-bd60-345e0e9caacdh
)}(h
}(h2. Install the required packages dashscope.
```bash
pip install dashscope
```
3. Add the following configuration to `taskweaver_config.json`:
```json showLineNumbers
{
    "llm.api_type": "qwen",
    "llm.model": "qwen-max", h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$7e0e80d9-4cb6-49d6-8adc-6706233a4407h
)}(h
}(h]    "llm.api_key": "YOUR_API_KEY"
}
```
NOTE: `llm.model` is the model name of QWen LLM API. h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$a04fa7a5-d4fd-44d3-8eff-018b5dbf9ea1h
)}(h
}(hYou can find the model name in the [QWen LLM model list](https://help.aliyun.com/zh/dashscope/developer-reference/model-square/?spm=a2c4g.11186623.0.0.35a36ffdt97ljI).
h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$8e29b8f0-7ef7-4163-ae78-73dda498db28h
)}(h
}(hu4. Start TaskWeaver and chat with TaskWeaver. 
You can refer to the [Quick Start](../quickstart.md) for more details.h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$d3568a7c-a82b-48c7-aad7-878c1990d107h
)}(h
}(hX	  # Terminal

1. Follow the instruction in [Quick Start](../quickstart.md) to clone the repo and make configurations

```bash
git clone https://github.com/microsoft/TaskWeaver.git
cd TaskWeaver
# install the requirements
pip install -r requirements.txt
```

```json
{h}(hdocs\usage\cmd.mdh
# TerminalhK uhh	uh(hhh}h}(hj  hj  usub$e4fedead-b409-4f6c-a3f0-bd01e34443dfh
)}(h
}(h"llm.api_key": "the api key",
"llm.model": "the model name, e.g., gpt-4"
}
```

2. Run the following command in terminal.
```bash
# assume you are in the taskweaver folder
# -p is the path to the project directoryh}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$18bedb45-c65d-4fa6-8b05-1da29aa96c34h
)}(h
}(hX,  python -m taskweaver -p ./project/
```
This will start the TaskWeaver process and you can interact with it through the command line interface. 
If everything goes well, you will see the following prompt:

```bash
=========================================================
 ___         _     _       __h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$c2fa0403-88e6-42d6-b754-e3fa77a5f4c8h
)}(h
}(hn|_   _|_ _ ___| | _ | |     / /__  ___ __   ___  ___
  | |/ _` / __| |/ /| | /| / / _ \/ __ `/ | / / _ \/ ___/h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$620dbab7-f755-43d6-8022-f236bcde0333h
)}(h
}(h  | | (_| \__ \   < | |/ |/ /  __/ /_/ /| |/ /  __/ /
  |_|\__,_|___/_|\_\|__/|__/\___/\__,_/ |___/\___/_/
=========================================================h}(hj  hj  hKuhh	uh(hhh}h}(hj
  hj  usub$c4681a8c-fb32-4643-ac73-dbb6c8f633dfh
)}(h
}(hqTaskWeaver: I am TaskWeaver, an AI assistant. To get started, could you please enter your request?
Human: ___
```h}(hj  hj  hKuhh	uh(hhh}h}(hj  hj  usub$dc8a7528-8614-4f32-88eb-188a1432071ah
)}(h
}(hX  # Library

If you want to use TaskWeaver as a library, you can refer to the following code example:

```python
from taskweaver.app.app import TaskWeaverApp

# This is the folder that contains the taskweaver_config.json file and not the repo root. Defaults to "./project/"h}(hdocs\usage\library.mdh	# LibraryhK uhh	uh(hhh}h}(hj  hj  usub$6c1e9ede-55c4-428f-a817-1a66c0017c0bh
)}(h
}(happ_dir = "./project/"
app = TaskWeaverApp(app_dir=app_dir)
session = app.get_session()

user_query = "hello, what can you do?"
response_round = session.send_message(user_query)
print(response_round.to_dict())
```
**Note:**h}(hj  hj  hKuhh	uh(hhh}h}(hj'  hj(  usub$211eed91-f59e-4cf3-a952-2b39f3399468h
)}(h
}(h- `response_round`: the response from TaskWeaver. which is an object of the `Round` class. 
  An example of the `Round` object is shown below:
```json
{
    "id": "round-20231201-043134-218a2681",h}(hj  hj  hKuhh	uh(hhh}h}(hj0  hj1  usub$f14c937c-2790-4320-a76f-5e9cfa01c775h
)}(h
}(h    "user_query": "hello, what can you do?",
    "state": "finished",
    "post_list": [
        {
            "id": "post-20231201-043134-10eedcca",
            "message": "hello, what can you do?",h}(hj  hj  hKuhh	uh(hhh}h}(hj9  hj:  usub$05853ba9-2773-40af-b566-0bba4d560da7h
)}(h
}(h            "send_from": "User",
            "send_to": "Planner",
            "attachment_list": []
        },
        {
            "id": "post-20231201-043141-86a2aaff",h}(hj  hj  hKuhh	uh(hhh}h}(hjB  hjC  usub$4d68c6e3-0858-4282-bd95-d42aecdac1bdh
)}(h
}(hXX              "message": "I can help you with various tasks, such as counting rows in a data file, detecting anomalies in a dataset, searching for products on Klarna, summarizing research papers, and pulling data from a SQL database. Please provide more information about the task you want to accomplish, and I'll guide you through the process.",h}(hj  hj  hKuhh	uh(hhh}h}(hjK  hjL  usub$499453aa-530f-4234-9930-6e3bf720363ch
)}(h
}(h            "send_from": "Planner",
            "send_to": "User",
            "attachment_list": [
                {
                    "id": "atta-20231201-043141-6bc4da86",
                    "type": "init_plan",h}(hj  hj  hKuhh	uh(hhh}h}(hjT  hjU  usub$a8195743-2e96-4fe4-8150-3b30b62d5883h
)}(h
}(hX*                      "content": "1. list the available functions"
                },
                {
                    "id": "atta-20231201-043141-6f29f6c9",
                    "type": "plan",
                    "content": "1. list the available functions"
                },
                {h}(hj  hj  hKuhh	uh(hhh}h}(hj]  hj^  usub$1607e350-e40a-45cb-9317-8ef4054a4281h
)}(h
}(h                    "id": "atta-20231201-043141-76186c7a",
                    "type": "current_plan_step",
                    "content": "1. list the available functions"
                }
            ]
        }
    ]
}
```h}(hj  hj  hKuhh	uh(hhh}h}(hjf  hjg  usub$cffcfcd6-f96b-43c9-a76a-4ec2a860c289h
)}(h
}(hX  # Web UI

1. Follow the instruction in [Quick Start](../quickstart.md) to clone the repo and make configurations

```bash
git clone https://github.com/microsoft/TaskWeaver.git
cd TaskWeaver
# install the requirements
pip install -r requirements.txt
```

```json
{h}(hdocs\usage\webui.mdh# Web UIhK uhh	uh(hhh}h}(hjo  hjp  usub$e7a401ad-8e29-4fb5-a1d1-5c82bfb9831fh
)}(h
}(h"llm.api_key": "the api key",
"llm.model": "the model name, e.g., gpt-4"
}
```

2. Install the `chainlit` package by `pip install chainlit==1.0.100` if you don't have it in your env.
h}(hjq  hjr  hKuhh	uh(hhh}h}(hjz  hj{  usub$d5a857c5-2141-447d-a813-dc316f4eed4ch
)}(h
}(h3. Start the service by running the following command.

```bash
cd playground/UI/
chainlit run app.py
```

4. Open the browser with http://localhost:8000 and you can start the trial. 

5. Below are some screenshots:h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usub$b9dda132-14e2-47b9-b58a-f8df7a2661b8h
)}(h
}(h![TaskWeaver UI Screenshot 1](../../static/img/ui_screenshot_1.png)
![TaskWeaver UI Screenshot 2](../../static/img/ui_screenshot_2.png)h}(hjq  hjr  hKuhh	uh(hhh}h}(hj  hj  usubusb}(K hKhKh(Kh1Kh:KhCKhLKhUKh^K	hgK
hpKhyKhK
hKhKhKhKhKhKhKhKhKhKhKhKhKj  Kj  Kj  Kj  Kj&  Kj/  K j8  K!jA  K"jJ  K#jS  K$j\  K%je  K&jn  K'jw  K(j  K)j  K*j  K+j  K,j  K-j  K.j  K/j  K0j  K1j  K2j  K3j  K4j  K5j  K6j   K7j	  K8j  K9j  K:j$  K;j-  K<j6  K=j?  K>jH  K?jQ  K@jZ  KAjc  KBjl  KCju  KDj~  KEj  KFj  KGj  KHj  KIj  KJj  KKj  KLj  KMj  KNj  KOj  KPj  KQj  KRj  KSj  KTj  KUj  KVj   KWj)  KXj2  KYj;  KZjD  K[jM  K\jV  K]j_  K^jh  K_jq  K`jz  Kaj  Kbj  Kcj  Kdj  Kej  Kfj  Kgj  Khj  Kij  Kjj  Kkj  Klj  Kmj  Knj  Koj  Kpj
  Kqj  Krj  Ksj%  Ktj.  Kuj7  Kvj@  KwjI  KxjR  Kyj[  Kzjf  K{jo  K|jx  K}j  K~j  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj
  Kj  Kj  Kj%  Kj.  Kj7  Kj@  KjI  KjR  Kj[  Kjd  Kjm  Kjv  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj#  Kj.  Kj7  KjB  KjK  KjT  Kj]  Kjf  Kjo  Kjx  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj
  Kj  Kj  Kj%  Kj.  Kj7  Kj@  KjI  KjR  Kj[  Kjd  Kjm  Kjv  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj!  Kj*  Kj3  Kj<  KjE  KjN  KjW  Kj`  Kji  Kjr  Kj{  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj  Kj	  Kj	  Kj	  Kj	  M j&	  Mj/	  Mj8	  MjA	  MjJ	  MjS	  Mj\	  Mje	  Mjn	  M	jw	  M
j	  Mj	  Mj	  M
j	  Mj	  Mj	  Mj	  Mj	  Mj	  Mj	  Mj	  Mj	  Mj	  Mj	  Mj	  Mj
  Mj
  Mj
  Mj"
  Mj+
  Mj4
  Mj=
  M jF
  M!jO
  M"jX
  M#ja
  M$jj
  M%js
  M&j|
  M'j
  M(j
  M)j
  M*j
  M+j
  M,j
  M-j
  M.j
  M/j
  M0j
  M1j
  M2j
  M3j
  M4j
  M5j  M6j  M7j  M8j  M9j'  M:j0  M;j9  M<jB  M=jK  M>jT  M?j]  M@jf  MAjo  MBjx  MCj  MDj  MEj  MFj  MGj  MHj  MIj  MJj  MKj  MLj  MMj  MNj  MOj  MPj  MQj  MRj  MSj  MTj  MUj#  MVj,  MWj5  MXj>  MYjG  MZjP  M[jY  M\jb  M]jk  M^jt  M_j}  M`j  Maj  Mbj  Mcj  Mdj  Mej  Mfj  Mgj  Mhj  Mij  Mjj  Mkj  Mlj  Mmj  Mnj
  Moj

  Mpj
  Mqj
  Mrj(
  Msj1
  Mtj:
  MujC
  MvjL
  MwjU
  Mxj^
  Myjg
  Mzjp
  M{jy
  M|j
  M}j
  M~j
  Mj
  Mj
  Mj
  Mj
  Mj
  Mj
  Mj
  Mj
  Mj
  Mj
  Mj
  Mj   Mj	  Mj  Mj  Mj$  Mj-  Mj6  Mj?  MjH  MjQ  MjZ  Mjc  Mjl  Mju  Mj~  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj   Mj)  Mj2  Mj;  MjD  MjM  MjV  Mj_  Mjh  Mjq  Mjz  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj
  Mj  Mj  Mj%  Mj.  Mj7  Mj@  MjI  MjR  Mj[  Mjd  Mjm  Mjv  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj#  Mj,  Mj5  Mj>  MjG  MjP  MjY  Mjb  Mjk  Mjv  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj!  M j*  Mj3  Mj<  MjE  MjN  MjW  Mj`  Mji  Mjr  M	j}  M
j  Mj  Mj  M
j  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj!  Mj*  Mj3  Mj<  MjE  M jN  M!jW  M"j`  M#ji  M$jr  M%j{  M&j  M'j  M(j  M)j  M*j  M+j  M,j  M-j  M.j  M/j  M0j  M1j  M2j  M3j  M4j  M5j
  M6j  M7j  M8j(  M9j1  M:j:  M;jC  M<jL  M=jU  M>j^  M?jg  M@jp  MAjy  MBj  MCj  MDj  MEj  MFj  MGj  MHj  MIj  MJj  MKj  MLj  MMj  MNj  MOj  MPj  MQj  MRj  MSj  MTj&  MUj/  MVj8  MWjA  MXjJ  MYjS  MZj\  M[je  M\jn  M]jw  M^j  M_j  M`j  Maj  Mbj  Mcj  Mdj  Mej  Mfj  Mgj  Mhj  Mij  Mjj  Mkj  Mlj   Mmj	  Mnj  Moj  Mpj&  Mqj/  Mrj8  MsjA  MtjJ  MujS  Mvj\  Mwje  Mxjn  Myjw  Mzj  M{j  M|j  M}j  M~j  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj"  Mj+  Mj4  Mj=  MjF  MjO  MjX  Mja  Mjj  Mjs  Mj|  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj"  Mj+  Mj4  Mj=  MjF  MjO  MjX  Mja  Mjj  Mjs  Mj|  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj&  Mj/  Mj:  MjC  MjL  MjU  Mj^  Mjg  Mjp  Mjy  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj  Mj#  Mj,  Mj5  Mj>  MjG  MjP  MjY  Mjb  Mjk  Mjv  Mj  Mj  u.