python

load_dotenv(dotenv_path=r"C:\Users\mahan\OneDrive\Desktop\GenAIus\Preetha\.env")
Loads the environment variables defined in the .env file, which includes the API key required for Google’s generative AI.
API Key Configuration
python

api_key = os.getenv("GOOGLE_API_KEY")
if api_key:
    genai.configure(api_key=api_key)
else:
    logging.error("API key is not set in environment variables.")
    exit(1)
Retrieves the Google API key from the environment variables and configures the generative AI client. If the API key is not found, an error message is logged, and the program exits.
Model Instance Creation
python

model = genai.GenerativeModel("gemini-1.5-flash")  # Use your required model
Creates an instance of the generative model using the specified model name. This instance will be used to generate responses to user queries.
Loading Cleaned Text Data
python

file_path = r"C:\Users\mahan\OneDrive\Desktop\GenAIus\Preetha\AllCleanData\AllCleanData.txt"
try:
    with open(file_path, 'r', encoding='utf-8') as file:
        cleaned_text = file.read()
except FileNotFoundError:
    logging.error(f"File not found: {file_path}")
    cleaned_text = None
Attempts to load cleaned text data from a specified file. If the file is not found, an error message is logged, and cleaned_text is set to None.
Function: generate_embeddings
python

def generate_embeddings(content, chunk_size=2000):
    model_name = "models/text-embedding-004"
    embeddings = []

    for i in range(0, len(content), chunk_size):
        chunk = content[i:i + chunk_size]
        try:
            # Get embeddings and extract the vector (assumed stored in 'embedding' key)
            chunk_embeddings = genai.embed_content(content=chunk, model=model_name)
            embeddings.append(chunk_embeddings['embedding'])  # Extract numeric vector
        except Exception as e:
            logging.error(f"Error embedding content: {e}")
            return None
    return embeddings
Purpose: Generates embeddings for the cleaned text data by splitting it into chunks and embedding each chunk.
Parameters:
content: The text content to be embedded.
chunk_size: The size of each chunk of text to be processed (default is 2000 characters).
Functionality:
Loops through the content in chunks, embedding each chunk using the specified model.
Appends each embedding to the embeddings list.
Handles exceptions during the embedding process by logging errors and returning None if an error occurs.
Embeddings Generation on Server Start
python

embeddings = generate_embeddings(cleaned_text) if cleaned_text else None
if embeddings is None:
    logging.error("Failed to generate embeddings.")
Calls the generate_embeddings function to create embeddings from the cleaned text when the server starts. If embeddings cannot be generated, an error message is logged.
Function: find_relevant_chunk
python

def find_relevant_chunk(user_question):
    # Embed the user question
    try:
        query_embedding = genai.embed_content(content=user_question, model="models/text-embedding-004")
        query_vector = query_embedding['embedding']  # Extract the actual embedding vector
    except Exception as e:
        logging.error(f"Error embedding question: {e}")
        return None

    # Calculate cosine similarity between the query and all text chunks
    similarities = cosine_similarity([query_vector], embeddings)
    most_similar_index = np.argmax(similarities)

    # Retrieve the most similar text chunk
    chunk_size = 2000
    start_idx = most_similar_index * chunk_size
    return cleaned_text[start_idx:start_idx + chunk_size]
Purpose: Finds the most relevant chunk of text in response to the user's question by calculating the cosine similarity between the user's question and the pre-generated embeddings.
Parameters:
user_question: The question posed by the user.
Functionality:
Embeds the user’s question using the same model as the text chunks.
Calculates the cosine similarity between the embedded question and all pre-generated embeddings.
Identifies the index of the most similar embedding.
Returns the corresponding chunk of cleaned text based on the index.
Function: generate_response
python

def generate_response(user_question):
    try:
        relevant_chunk = find_relevant_chunk(user_question)
        if not relevant_chunk:
            return "Sorry, I could not find any relevant information in the knowledge base."

        prompt = f"""
        -You are "GenAIus KT", and your role is to help with the onboarding process.
        -Answer the following question based on the knowledge base: '{user_question}'. Here is the relevant information: {relevant_chunk}.
        -If you can't find relevant information in the context, generate an answer.
        -Be formal, friendly, and professional.
        - Donot provide any technical question answers if it is not mentioned in the knowledgebase. 
        """
        
        response = model.generate_content([prompt])
        return response.text if hasattr(response, 'text') else "No response content found."
    except Exception as e:
        logging.error(f"Error generating response: {e}")
        return "An error occurred while generating the response."
Purpose: Generates a response based on the user's question and the relevant text chunk found.
Parameters:
user_question: The question asked by the user.
Functionality:
Calls find_relevant_chunk to retrieve the relevant text chunk for the question.
Constructs a prompt for the generative model, instructing it to respond formally and professionally.
Generates a response using the generative model and returns the response text. If an error occurs during this process, it logs the error and returns a generic error message.
API Endpoint: /api/chat
python

@app.route('/api/chat', methods=['POST'])
def chat():
    data = request.json  # Get data from the POST request
    user_question = data.get('message')

    if not user_question:
        return jsonify({"error": "No message provided"}), 400

    # Generate response from the model using the relevant chunk
    bot_response = generate_response(user_question)

    # Convert the bot's response to HTML using Markdown
    bot_response_html = markdown(bot_response)

    # Send the bot's response back to the frontend
    return jsonify({"reply": bot_response_html})
Purpose: This function handles incoming chat requests from the frontend.
Functionality:
Expects a JSON payload with the user’s question in the field message.
Validates that a message is provided; if not, it returns a 400 error.
Calls generate_response to produce a response based on the user’s question.
Converts the response to HTML format using the Markdown library for proper rendering on the frontend.
Returns the response as JSON, which can be easily consumed by the frontend.
Running the Flask App
python

if __name__ == '__main__':
    app.run(debug=True)
This block checks if the script is being run directly (as opposed to being imported as a module). If it is, it starts the Flask development server in debug mode, allowing for real-time code updates and error logging.


2. query.py
import os
import logging
from dotenv import load_dotenv
import google.generativeai as genai

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Load environment variables for API key
load_dotenv()

# Configure the Gemini API key
api_key = os.getenv("GOOGLE_API_KEY")
if api_key:
    genai.configure(api_key=api_key)
else:
    logging.error("API key is not set in the environment variables.")
    exit(1)  # Exit the program if API key is not available

# Create a model instance
model = genai.GenerativeModel("gemini-1.5-flash")  # Adjust the model name as needed

# Function to list available models
def list_available_models():
    try:
        models = genai.list_models()
        model_names = [model.name for model in models]  # Extracting model names
        logging.info("Available models: " + ", ".join(model_names))
    except Exception as e:
        logging.error(f"Error listing models: {e}")

# Function to read cleaned text from file with error handling
def get_clean_text(file_path):
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            text = file.read()
        return text
    except FileNotFoundError:
        logging.error(f"File not found: {file_path}")
        return None
    except Exception as e:
        logging.error(f"Error reading file: {e}")
        return None

# Function to generate embeddings for the cleaned data
def generate_embeddings(content, chunk_size=2000):  # Set a chunk size smaller than the limit
    model_name = "models/text-embedding-004"  # Use the available text embedding model
    embeddings = []
    
    # Split the content into chunks
    for i in range(0, len(content), chunk_size):
        chunk = content[i:i + chunk_size]
        try:
            chunk_embeddings = genai.embed_content(content=chunk, model=model_name)
            embeddings.append(chunk_embeddings)
        except Exception as e:
            logging.error(f"Error embedding content: {e}")
            return None

    return embeddings  # Return all embeddings as a list

# Function to generate a response from the generative model using RAG
def generate_response(user_question, cleaned_text):
    try:
        # Format the prompt to guide the model for RAG
        prompt = f"""Using the knowledge base, answer the following question: '{user_question}'. Here is the information: {cleaned_text}. try to be relevant and answer to some vague questions also. If you cannot find anything in the context document try to generate it by yourself.
        If someone ask who are you then asnwer that you are the 'your onboarding buddy'.
         Act professional and freindly but be sweet as the same time."""
        
        # Using the generate_content method from the model instance
        response = model.generate_content([prompt])  # Use the model instance
        return response.text if hasattr(response, 'text') else "No response content found."
    except Exception as e:
        logging.error(f"Error generating response: {e}")
        return "An error occurred while generating the response."

def main():
    # File path to the cleaned text
    file_path = "C:/Users/mahan/OneDrive/Desktop/GenAIus/Preetha/AllCleanData/AllCleanData.txt"

    # Process the text and create the FAISS index
    logging.info("Processing cleaned data...")
    raw_text = get_clean_text(file_path)
    if raw_text is not None:
        logging.info("Text data loaded successfully.")
        
        # List available models
        logging.info("Listing available models...")
        list_available_models()

        # Generate embeddings for the cleaned data
        embeddings = generate_embeddings(raw_text)
        if embeddings is not None:
            logging.info("Embeddings generated successfully.")
        else:
            logging.error("Failed to generate embeddings.")
            return  # Exit if embedding fails
    else:
        logging.info("Failed to read cleaned text.")
        return  # Exit if reading text fails

    # Take user queries in the console
    while True:
        user_question = input("\nAsk a question from the cleaned data (or type 'exit' to quit): ")
        if user_question.lower() == 'exit':
            break
        answer = generate_response(user_question, raw_text)  # Pass cleaned text to the response function
        print("Reply from Gemini: ", answer)

if __name__ == "__main__":
    main()

2. query.py -> Functions Explained
Imports
python

import os
import logging
from dotenv import load_dotenv
import google.generativeai as genai
os: For file and environment variable handling.
logging: To capture and record important events like errors.
dotenv: To load environment variables (e.g., API key).
genai: The AI library for handling generative responses.
Function: list_available_models
python

def list_available_models():
    try:
        models = genai.list_models()
        model_names = [model.name for model in models]  # Extracting model names
        logging.info("Available models: " + ", ".join(model_names))
    except Exception as e:
        logging.error(f"Error listing models: {e}")
Purpose: Lists all available models from the Google Generative AI library.
Steps:
Retrieves a list of models using genai.list_models.
Extracts the model names and logs them for reference.
Handles errors during the process.
Key Logic: Helps developers understand what AI models are available for use.
Function: get_clean_text
python

def get_clean_text(file_path):
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            text = file.read()
        return text
    except FileNotFoundError:
        logging.error(f"File not found: {file_path}")
        return None
    except Exception as e:
        logging.error(f"Error reading file: {e}")
        return None
Purpose: Reads cleaned data from a file and returns it as a string.
Parameters:
file_path: The location of the file containing cleaned text.
Steps:
Opens and reads the file content.
Returns the content or handles file read errors by logging them.
Key Logic: Provides a single access point to retrieve the knowledge base for AI processing.
Function: generate_embeddings
python

def generate_embeddings(content, chunk_size=2000):
    model_name = "models/text-embedding-004"
    embeddings = []
    
    for i in range(0, len(content), chunk_size):
        chunk = content[i:i + chunk_size]
        try:
            chunk_embeddings = genai.embed_content(content=chunk, model=model_name)
            embeddings.append(chunk_embeddings)
        except Exception as e:
            logging.error(f"Error embedding content: {e}")
            return None

    return embeddings
Purpose: Generates embeddings for chunks of content using a text embedding model.
Parameters:
content: Text content that needs to be embedded.
chunk_size: The size of each chunk for processing (default is 2000 characters).
Steps:
Splits the content into smaller chunks for embedding.
Uses genai.embed_content to generate embeddings for each chunk.
Handles errors and logs any embedding failures.
Key Logic: Handles large content efficiently by processing in smaller parts.
Function: generate_response
python

def generate_response(user_question, cleaned_text):
    try:
        prompt = f"""Using the knowledge base, answer the following question: '{user_question}'. Here is the information: {cleaned_text}. try to be relevant and answer to some vague questions also. If you cannot find anything in the context document try to generate it by yourself.
        If someone ask who are you then asnwer that you are the 'your onboarding buddy'.
         Act professional and freindly but be sweet as the same time."""
        
        response = model.generate_content([prompt])
        return response.text if hasattr(response, 'text') else "No response content found."
    except Exception as e:
        logging.error(f"Error generating response: {e}")
        return "An error occurred while generating the response."
Purpose: Similar to app.py, generates a response based on user input but with a slightly different prompt. Handles onboarding scenarios as well.
Parameters:
user_question: The user's question.
cleaned_text: Knowledge base data.
Key Logic: Emphasizes user-friendliness and onboarding assistance.
Function: main
python

def main():
    file_path = "C:/Users/mahan/OneDrive/Desktop/GenAIus/Preetha/AllCleanData.txt"
    text = get_clean_text(file_path)
    if not text:
        print("File not found or could not be read.")
        return

    embeddings = generate_embeddings(text)
    if embeddings:
        print("Embeddings generated successfully.")
    else:
        print("Failed to generate embeddings.")
Purpose: Orchestrates the primary operations like reading the knowledge base and generating embeddings.
Steps:
Retrieves cleaned text using get_clean_text.
Generates embeddings for the content.
Provides console output based on success or failure.
Key Logic: A high-level overview of essential backend processes.




Tables:
--------------------------------------------------------------------------------
Extracting data from CSV file: AI_Tools_Connectivity_Manufacturing_Timeline.csv

Task Name, Start Date, End Date, Duration, Responsible Person, Status
Risk Management Planning, 2024-02-21 00:00:00, 2024-03-11 00:00:00, 19, Team Member 10, In Progress
Manufacturing Workflow Simulation, 2024-03-10 00:00:00, 2024-03-15 00:00:00, 5, Team Member 7, Delayed
Performance Monitoring Setup, 2024-04-03 00:00:00, 2024-04-20 00:00:00, 17, Team Member 2, Not Started
AI Model Fine-tuning, 2024-04-18 00:00:00, 2024-05-01 00:00:00, 13, Team Member 4, Delayed
Compliance Assessment, 2024-02-16 00:00:00, 2024-02-22 00:00:00, 6, Team Member 1, Completed
Deployment Preparation, 2024-04-28 00:00:00, 2024-05-04 00:00:00, 6, Team Member 3, Completed
Scalability Assessment, 2024-03-26 00:00:00, 2024-04-04 00:00:00, 9, Team Member 8, Completed
Final AI Tool Configuration, 2024-03-17 00:00:00, 2024-03-27 00:00:00, 10, Team Member 6, In Progress
AI Tool Testing, 2024-01-24 00:00:00, 2024-01-31 00:00:00, 7, Team Member 2, Not Started
Data Flow Optimization, 2024-01-31 00:00:00, 2024-02-16 00:00:00, 16, Team Member 1, Not Started
AI Tool Optimization, 2024-02-11 00:00:00, 2024-02-16 00:00:00, 5, Team Member 1, Delayed
Full-scale Deployment, 2024-01-18 00:00:00, 2024-01-23 00:00:00, 5, Team Member 2, In Progress
System Architecture Design, 2024-03-22 00:00:00, 2024-03-29 00:00:00, 7, Team Member 1, In Progress
Cloud Integration, 2024-05-04 00:00:00, 2024-05-16 00:00:00, 12, Team Member 4, In Progress
Final Project Review, 2024-03-28 00:00:00, 2024-04-08 00:00:00, 11, Team Member 10, Delayed
Post-Deployment Review, 2024-04-28 00:00:00, 2024-05-07 00:00:00, 9, Team Member 1, In Progress
System Security Assessment, 2024-04-11 00:00:00, 2024-04-26 00:00:00, 15, Team Member 3, In Progress
Initial Planning, 2024-01-11 00:00:00, 2024-01-19 00:00:00, 8, Team Member 9, Delayed
Integration Testing, 2024-03-22 00:00:00, 2024-03-30 00:00:00, 8, Team Member 3, Completed
Advanced Analytics Implementation, 2024-04-02 00:00:00, 2024-04-10 00:00:00, 8, Team Member 10, Delayed
User Training Sessions, 2024-03-16 00:00:00, 2024-03-27 00:00:00, 11, Team Member 5, Not Started
Pilot Deployment, 2024-01-24 00:00:00, 2024-02-07 00:00:00, 14, Team Member 10, In Progress
System Documentation, 2024-05-06 00:00:00, 2024-05-23 00:00:00, 17, Team Member 10, In Progress
Connectivity Module Integration, 2024-03-21 00:00:00, 2024-03-29 00:00:00, 8, Team Member 4, In Progress
Resource Allocation Planning, 2024-05-06 00:00:00, 2024-05-11 00:00:00, 5, Team Member 8, Delayed
Requirement Gathering, 2024-02-05 00:00:00, 2024-02-11 00:00:00, 6, Team Member 1, In Progress
Feedback Collection, 2024-04-07 00:00:00, 2024-04-14 00:00:00, 7, Team Member 9, Delayed
Data Analysis Setup, 2024-02-15 00:00:00, 2024-02-23 00:00:00, 8, Team Member 3, Delayed
Technical Support Setup, 2024-03-25 00:00:00, 2024-04-13 00:00:00, 19, Team Member 3, In Progress
Prototype Testing, 2024-02-19 00:00:00, 2024-02-29 00:00:00, 10, Team Member 5, Not Started
Manufacturing System Synchronization, 2024-02-28 00:00:00, 2024-03-07 00:00:00, 8, Team Member 8, Not Started
AI Model Development, 2024-03-16 00:00:00, 2024-03-21 00:00:00, 5, Team Member 7, Completed
Edge Computing Optimization, 2024-01-11 00:00:00, 2024-01-26 00:00:00, 15, Team Member 5, Completed
--------------------------------------------------------------------------------
Extracting data from CSV file: FlexStaff_Project_Timeline.csv

Task Name, Start Date, End Date, Duration, Responsible Person, Status
Task 1, 2024-08-28, 2024-11-22, 86, Alex Turner, In Progress
Task 2, 2024-08-25, 2024-10-24, 60, Liam O'Connor, In Progress
Task 3, 2024-01-30, 2024-11-22, 297, Nina Gomez, Delayed
Task 4, 2024-03-24, 2024-06-09, 77, Nina Gomez, On Hold
Task 5, 2024-04-27, 2024-09-21, 147, Liam O'Connor, Not Started
Task 6, 2024-11-12, 2024-12-11, 29, Liam O'Connor, On Hold
Task 7, 2024-05-20, 2024-07-28, 69, Sam Patel, Delayed
Task 8, 2024-03-16, 2024-11-20, 249, Alex Turner, On Hold
Task 9, 2024-04-09, 2024-09-22, 166, Alex Turner, Not Started
Task 10, 2024-07-29, 2024-09-01, 34, Liam O'Connor, Delayed
Task 11, 2024-01-09, 2024-11-14, 310, Alex Turner, On Hold
Task 12, 2024-10-30, 2024-11-13, 14, Liam O'Connor, Not Started
Task 13, 2024-11-19, 2024-12-05, 16, Sophia Lee, Completed
Task 14, 2024-12-17, 2024-12-17, 0, Nina Gomez, Not Started
Task 15, 2024-03-03, 2024-06-02, 91, Alex Turner, Delayed
Task 16, 2024-07-30, 2024-12-18, 141, Liam O'Connor, Completed
Task 17, 2024-12-05, 2024-12-28, 23, Nina Gomez, Delayed
Task 18, 2024-01-19, 2024-10-10, 265, Sam Patel, Completed
Task 19, 2024-10-07, 2024-12-27, 81, Alex Turner, In Progress
Task 20, 2024-10-08, 2024-11-16, 39, Alex Turner, On Hold
Task 21, 2024-08-19, 2024-11-09, 82, Nina Gomez, On Hold
Task 22, 2024-11-03, 2024-11-19, 16, Alex Turner, On Hold
Task 23, 2024-06-14, 2024-12-24, 193, Alex Turner, On Hold
Task 24, 2024-02-16, 2024-04-12, 56, Nina Gomez, On Hold
Task 25, 2024-06-01, 2024-10-17, 138, Nina Gomez, Not Started
Task 26, 2024-10-03, 2024-10-04, 1, Sam Patel, On Hold
Task 27, 2024-03-22, 2024-11-27, 250, Liam O'Connor, Delayed
Task 28, 2024-06-18, 2024-11-20, 155, Nina Gomez, On Hold
Task 29, 2024-10-25, 2024-11-22, 28, Sam Patel, Completed
Task 30, 2024-08-28, 2024-12-25, 119, Nina Gomez, Delayed
Task 31, 2024-03-12, 2024-09-11, 183, Sam Patel, In Progress
Task 32, 2024-01-18, 2024-09-20, 246, Sophia Lee, On Hold
Task 33, 2024-02-02, 2024-07-02, 151, Alex Turner, Delayed
Task 34, 2024-10-21, 2024-12-18, 58, Sophia Lee, In Progress
Task 35, 2024-11-13, 2024-12-11, 28, Liam O'Connor, Delayed
Task 36, 2024-12-25, 2024-12-26, 1, Alex Turner, On Hold
Task 37, 2024-10-12, 2024-12-14, 63, Nina Gomez, Delayed
Task 38, 2024-04-23, 2024-04-29, 6, Sophia Lee, Delayed
Task 39, 2024-04-23, 2024-07-24, 92, Sam Patel, Not Started
Task 40, 2024-03-31, 2024-10-19, 202, Sophia Lee, On Hold
--------------------------------------------------------------------------------
Extracting data from CSV file: Project Timeline for CloudShift Seamless Cloud Transformation for Telecom NextGen Inc.csv

Task Name, Start Date, End Date, Duration (days), Responsible Person, Status
Project Kickoff, 1/2/2024, 1/5/2024, 3, Michael Reed, Completed
Requirements Gathering, 1/6/2024, 1/15/2024, 9, Sarah Lin, Completed
Stakeholder Alignment Meetings, 1/16/2024, 1/18/2024, 2, David Chen, Completed
Technical Architecture Design, 1/19/2024, 1/29/2024, 10, Emily Jackson, Completed
Cloud Vendor Evaluation and Selection, 1/30/2024, 2/5/2024, 6, Mark Wilson, Completed
Initial Cloud Environment Setup, 2/6/2024, 2/15/2024, 9, Nathan Hall, Completed
Security Assessment and Planning, 2/16/2024, 2/23/2024, 7, Olivia Martin, Completed
Application Migration Strategy, 2/24/2024, 3/2/2024, 6, Megan Evans, Completed
Legacy System Integration, 3/3/2024, 3/12/2024, 9, Carlos Rodriguez, In Progress
Network Configuration, 3/13/2024, 3/20/2024, 7, Sophia Lee, In Progress
Database Migration, 3/21/2024, 3/31/2024, 10, David Harris, Not Started
Performance Testing, 4/1/2024, 4/10/2024, 9, Daniel Scott, Not Started
Cloud Environment Scaling, 4/11/2024, 4/20/2024, 9, Rebecca Adams, Not Started
Data Backup and Recovery Plan, 4/21/2024, 4/28/2024, 7, Joseph Perez, Not Started
User Access Management Implementation, 4/29/2024, 5/5/2024, 6, Emily Johnson, Not Started
Cloud Cost Optimization, 5/6/2024, 5/15/2024, 9, Michael Anderson, Not Started
Training for Cloud Operations Team, 5/16/2024, 5/23/2024, 7, Jessica Brown, Not Started
Monitoring System Setup, 5/24/2024, 5/30/2024, 6, Eric Thomas, Not Started
Compliance Audits, 6/1/2024, 6/7/2024, 6, Christine White, Not Started
Final Cloud Deployment, 6/8/2024, 6/15/2024, 7, Brian Davis, Not Started
Post-Deployment Monitoring, 6/16/2024, 6/23/2024, 7, Lisa Nguyen, Not Started
Application Optimization, 6/24/2024, 7/2/2024, 8, Samuel Lopez, Not Started
Disaster Recovery Testing, 7/3/2024, 7/10/2024, 7, Anthony King, Not Started
User Training Sessions, 7/11/2024, 7/20/2024, 9, Angela Martinez, Not Started
Cloud Environment Handover, 7/21/2024, 7/25/2024, 4, Matthew Cooper, Not Started
Cloud Security Monitoring Setup, 7/26/2024, 8/1/2024, 6, Natalie Ross, Not Started
Performance Monitoring System Update, 8/2/2024, 8/10/2024, 8, Kevin Turner, Not Started
Final Security Review, 8/11/2024, 8/15/2024, 4, Andrew Moore, Not Started
Final Handover and Project Closure, 8/16/2024, 8/20/2024, 4, Emily Johnson, Not Started
