Your README.txt should contain two sections. In the first section, you should explain the Usage of your tool, e.g. command line arguments and pointers to some small text data that can be used to run your tool. In the second section, please explain the major functions of your source code and where to find their implementations in your source code.

# Usage

cd beyond_vector_search
source venv/bin/Activate
./run.sh


## Input Data

We have the data stored for the evaluation and the data generated in the example run of our script.

The example data can be found under "data/example".

The actual data used for gathering metrics can be found in "data/workloads", it contains the following files:
```
cv0_05_num20_prob0_1.csv  cv0_1_num10_prob0_1.csv  cv0_3_num5_prob0_1.csv  cv0_5_num3_prob0_1.csv  cv0_7_num2_prob0_1.csv
cv0_05_num20_prob0_3.csv  cv0_1_num10_prob0_3.csv  cv0_3_num5_prob0_3.csv  cv0_5_num3_prob0_3.csv  cv0_7_num2_prob0_3.csv
cv0_05_num20_prob0_5.csv  cv0_1_num10_prob0_5.csv  cv0_3_num5_prob0_5.csv  cv0_5_num3_prob0_5.csv  cv0_7_num2_prob0_5.csv
cv0_05_num20_prob1_0.csv  cv0_1_num10_prob1_0.csv  cv0_3_num5_prob1_0.csv  cv0_5_num3_prob1_0.csv  cv0_7_num2_prob1_0.csv
```

They were generated by running the following script:
```
cd workloads
./workload_gen.sh
```

# Source Code

make_vectordb.py: a script to build a vector database from a "data/filtered_data.pickle"

utils/
    - build_graph.py: a script containing helper functions for building the knowledge graph
    - parse_arxiv.py: a script containing helper functions for parsing the arxiv dataset
vector_graph/
    - bipartite_graph_dict.py: A custom implementation of the bipartite graph
    - bipartite_graph_networkx.py: An experimental implementation of the bipartite graph using networkx
    - embedding_models.py: A custom implementation of the embedding models for generating the text embeddings
workloads
    - keyword_extractor.py
    - query_gen.py: A script for generating the text queries given paper data points
    - workload_gen.sh: This is the script for generating the workloads we described in the report
testing
    - inference.py: A script for executing our various search query engines on the generated workloads
zy_testing
    - compute_metrics_cos.py: A script for computing the accuracy of our results utilizing various performance metrics

