<div align="center">
    <h1>Awesome AI Tools</h1>
    <a href="https://awesome.re"><img src="https://awesome.re/badge.svg"/></a>
</div>

English | [中文](README-CN.md)

This repo collects AI-related utilities. 


<a href="https://www.buymeacoffee.com/ikaijuaawesomeaitools" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/default-orange.png" alt="Buy Me A Coffee" height="41" width="174"></a>

## All Categories
- [All Categories](#all-categories)
  - [ChatGPT and other closed-source LLMs](#chatgpt-and-other-closed-source-llms)
  - [AI Search engine](#ai-search-engine)
  - [Open Source LLMs](#open-source-llms)
  - [GPT/LLMs Applications](#gpt-llms-applications)
  - [AI Image Creation](#ai-image-creation)
  - [LLM Prompts](#llm-prompts)
  - [LLM Leaderboard](#llm-leaderboard)
  - [LLM training platform](#llm-training-platform)
  - [Applications that integrate multiple LLMs](#applications-that-integrate-multiple-llms)
  - [AI Agent](#ai-agent)
  - [Writing](#writing)
  - [Programming Development](#programming-development)
  - [Translation](#translation)
  - [AI Conversation or AI Voice Conversation](#ai-conversation-or-ai-voice-conversation)
  - [Speech Recognition](#speech-recognition)
  - [Music Recognition](#music-recognition)
  - [Text To Speech](#text-to-speech)
  - [Voice Processing](#voice-processing)
  - [AI generated music or sound effects](#ai-generated-music-or-sound-effects)
  - [Speech translation](#speech-translation)
  - [Video Creation](#video-creation)
  - [Video Content Summary](#video-content-summary)
  - [OCR(Optical Character Recognition)](#ocr)

### ChatGPT and other closed-source LLMs
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- |
| ChatGPT | OpenAI's chatgpt | [URL](https://chat.openai.com) | Free/Paid | 
| Claude| Anthropic's AI assistant|[URL](https://claude.ai/)| Free/Paid|
| Gemini| Google's conversational, AI chat service. Google's latest LLM, including Gemini Nono, Gemini Pro and Gemini Ultra. Gemini Pro is open for api and sdk use.  Gemini is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code |[URL](https://gemini.google.com/) <br> dev: [URL](https://ai.google.dev/)|Free|
| Microsoft Copilot| Microsoft's AI assistant.|[URL](https://copilot.microsoft.com/)|Free|
| Le Chat| Mistral.ai's conversational, AI chat service|[URL](https://chat.mistral.ai/chat)|Free|

### AI Search engine
| Name | Description | Links | Fees |  
| --- | --- | --- | --- |
| Perplexity.ai | AI-driven conversational search engine. | [URL](https://www.perplexity.ai) | Free|
| You.com | A search engine in conversation mode | [URL](https://you.com) | Free |

### Open Source LLMs
| Name | Description | Links | Fees |
| ---- | ----------------------------- | --- | --- |
| Llama 3 | Llama3 is a large language model developed by Meta AI. It is the successor to Meta's Llama2 language model. <br>Online test address:<br>[huggingface.co/Meta-Llama-3-70B-Instruct](https://huggingface.co/chat/models/meta-llama/Meta-Llama-3-70B-Instruct) |[GitHub](https://github.com/meta-llama/llama3) ![GitHub Repo stars](https://img.shields.io/github/stars/meta-llama/llama3?style=social)| Free |
| Mixtral |Mixtral 8x7B, a high-quality sparse mixture of experts model (SMoE) with open weights. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It matches or outperforms GPT3.5 on most standard benchmarks. <br>paper：https://arxiv.org/pdf/2401.04088.pdf <br>news：https://mistral.ai/news/mixtral-of-experts/ |[mistral-inference](https://github.com/mistralai/mistral-inference) ![GitHub Repo stars](https://img.shields.io/github/stars/mistralai/mistral-inference?style=social)<br>[mistral-finetune](https://github.com/mistralai/mistral-finetune) ![GitHub Repo stars](https://img.shields.io/github/stars/mistralai/mistral-finetune?style=social)|Free|
|grok-1|A large language model open sourced by xAI|[Github](https://github.com/xai-org/grok-1) ![GitHub Repo stars](https://img.shields.io/github/stars/xai-org/grok-1?style=social)|Free|
|Phi-3| Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.|[Github](https://github.com/microsoft/Phi-3CookBook) ![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/Phi-3CookBook?style=social)|Free|

### GPT LLMs Applications
| Name | Description | Links | Fees | 
-|-|-|-
| Poe | AI product built by Quora. Can use ChatGPT, Sage, Dragonfly, Claude bots for free. All you need is an email address to register. GPT-4 can be used once a day for free | [URL](https://poe.com/) | Free, with paid upgrades|
| HuggingChat|Open source codebase powering the HuggingChat app. [URL](https://huggingface.co/chat/)|[Github](https://github.com/huggingface/chat-ui) ![GitHub Repo stars](https://img.shields.io/github/stars/huggingface/chat-ui?style=social)|Free|
| NotebookLM |AI Research Assistant developed by Google. Upload PDFs, websites, YouTube videos, audio files, Google Docs, or Google Slides, and NotebookLM will summarize them and make interesting connections between topics. Audio Overview feature can turn your sources into engaging “Deep Dive” discussions with one click. |[URL](https://notebooklm.google.com/)|Free|
| Learn about |AI learning Assistant developed by Google.Grasp new topics and deepen your understanding with a conversational learning companion that adapts to your unique curiosity and learning goals.|[URL](https://learning.google.com/experiments/learn-about)|Free|
| monica | AI assistant that provides help with a variety of tasks such as searching, reading, writing, translating, drawing, and more. Standalone apps and browser plug-ins available| [URL](https://monica.im) <br> [chrome extension](https://chromewebstore.google.com/detail/monica-your-ai-copilot-po/ofpnmcalabcbjgholdjcjblkibolbppb)|Free, with paid upgrades|
| ollama |Get up and running with Llama 2, Mistral, Gemma, and other large language models.|[Github](https://github.com/ollama/ollama) ![GitHub Repo stars](https://img.shields.io/github/stars/ollama/ollama?style=social)| Free |
| openai/openai-python | The official Python library for the OpenAI API, It is generated from [OpenAPI specification ](https://github.com/openai/openai-openapi) with [Stainless](https://stainlessapi.com/) | [Github](https://github.com/openai/openai-python)![GitHub Repo stars](https://img.shields.io/github/stars/abi/screenshot-to-code?style=social)| Free, need OpenAPI [apikey](https://platform.openai.com/account/api-keys) |
|sashabaranov/go-openai|This library provides unofficial Go clients for OpenAI API. support:  ChatGPT, GPT-3, GPT-4, DALL·E 2|[Github](https://github.com/sashabaranov/go-openai)![GitHub Repo stars](https://img.shields.io/github/stars/sashabaranov/go-openai?style=social)|Free|
|langchain|LangChain is a framework for developing applications powered by language models.|[Github](https://github.com/langchain-ai/langchain) ![GitHub Repo stars](https://img.shields.io/github/stars/langchain-ai/langchain?style=social)|Free|
|Helicone AI|Helicone is the open-source LLM observability platform for logging, monitoring, and debugging AI applications.|[Github](https://github.com/Helicone/helicone) ![GitHub Repo stars](https://img.shields.io/github/stars/Helicone/helicone?style=social)|Free|
|ChatGPT-Next-Web|One-Click to get a well-designed cross-platform ChatGPT web UI, with GPT3, GPT4 & Gemini Pro support.|[Github](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web) ![GitHub Repo stars](https://img.shields.io/github/stars/ChatGPTNextWeb/ChatGPT-Next-Web?style=social)|Free|
| screenshot-to-code | This simple app converts a screenshot to HTML/Tailwind CSS. It uses GPT-4 Vision to generate the code and DALL-E 3 to generate similar-looking images. You can now also enter a URL to clone a live website! | [GitHub](https://github.com/abi/screenshot-to-code) ![GitHub Repo stars](https://img.shields.io/github/stars/abi/screenshot-to-code?style=social)| Free, need access to GPT-4 Vision|
| Chatbox | Desktop application that uses ChatGPT API (OpenAI API) to store all chat messages and prompts locally, thus reducing the risk of data loss. A bit more stable to use than the web version| [GitHub](https://github.com/Bin-Huang/chatbox) ![GitHub Repo stars](https://img.shields.io/github/stars/Bin-Huang/chatbox?style=social)| Free, requires [apikey with OpenAPI](https://platform.openai.com/account/api-keys)|
| gpt-crawler | Crawl a site to generate knowledge files to create your own custom GPT from a URL | [Github](https://github.com/BuilderIO/gpt-crawler)![GitHub Repo stars](https://img.shields.io/github/stars/BuilderIO/gpt-crawler?style=social)| Free |
| ChatGPT-Shortcut | Open source, ChatGPT shortcut commands that double productivity, partitioned by domain and function, can filter prompt words by tag, keyword search and one-click copy. |[GitHub](https://github.com/rockbenben/ChatGPT-Shortcut) ![GitHub Repo stars](https://img.shields.io/github/stars/rockbenben/ChatGPT-Shortcut?style=social)|Free|
|ChatGPT Sidebar|ChatGPT Sidebar is an artificial intelligence assistant you can use while browsing any website. |[URL](https://chrome.google.com/webstore/detail/chatgpt-sidebar-support-g/difoiogjjojoaoomphldepapgpbgkhkb)|Free|
| WebChatGPT | Open source, expand the ability of networking to chatgpt | [GitHub](https://github.com/qunash/chatgpt-advanced) </br>![GitHub Repo stars](https://img.shields.io/github/stars/qunash/chatgpt-advanced?style=social)| Free|
| AIPRM for ChatGPT |Browser plug-in, providing a series of selected ChatGPT instruction templates, and even creating your own, and adjusting AI tone and writing style| [URL](https://chrome.google.com/webstore/detail/aiprm-for-chatgpt/ojnbohmppadfgpejeebfnmnknjdlckgj) | Free|
| GPTCache |⚡ GPTCache is a library for creating semantic cache to store responses from LLM queries. It can be used to speed up and lower the cost of chat applications that rely on the LLM service. And it's similar to redis in an aigc scenario.| [Github](https://github.com/zilliztech/GPTCache) </br>![GitHub Repo stars](https://img.shields.io/github/stars/zilliztech/GPTCache?style=social)| Free|
| MindMac | Feature-rich & privacy-first native ChatGPT app for macOS to use OpenAI, Azure OpenAI, Anthropic Claude, OpenRouter all in one place, designed for maximum productivity. Currently available in 15 languages. | [URL](https://mindmac.app/) | Free, with paid upgrades|
| MemFree | Open Source Hybrid AI Search Engine, Instantly Get Accurate Answers from the Internet, Bookmarks, Notes, and Docs. Support One-Click Deployment. | [Github](https://github.com/memfreeme/memfree) </br>![GitHub Repo stars](https://img.shields.io/github/stars/memfreeme/memfree?style=social)| Free & Suport one-click self-host|

### AI Image Creation
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- |
| Midjourney | Enter text or pictures to create pictures | [URL](https://www.midjourney.com) | Free account has a certain usage minutes limit, and there is a paid upgrade version |
| Photoshop AI| Adobe Photoshop generative-fill| [URL](https://www.adobe.com/products/photoshop/generative-fill.html) |Paid|
| Stable diffusion webui | Open source project, input text or pictures to create pictures, Stable diffusion webui is the GUI of Stable diffusion, and it is an image user interface that visualizes stable diffusion. It also integrates many other useful extension scripts. | [GitHub](https://github.com/AUTOMATIC1111/stable-diffusion-webui) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/AUTOMATIC1111/stable-diffusion-webui?style=social)| Free|
| civitai |	civitai.com is a website platform for sharing AI image creation model resources, with a large number of models, has become the main model exchange place in the SD open source community	| [URL](https://civitai.com/)	| Free|
| clipdrop |	clipdrop by stability.ai. Has many AI image processing tools, such as stable diffusion XL, uncrop, reimage XL, stable doodle. |	[URL](https://clipdrop.co/)	| Free/Paid |
| firefly | Adobe's AI image processing web site |[URL](https://firefly.adobe.com/)|Free/Paid|
| ideogram.ai | Enter text to create pictures. A product developed by a company founded by many ex-Googlers |[URL](https://ideogram.ai/)| Free/Paid |
| Skybox AI | Generate 360-degree panoramic images using text prompts  | [URL](https://skybox.blockadelabs.com/)| Free/Paid|
|DragGAN|Interactive Point-based Manipulation on the Generative Image Manifold|[GitHub](https://github.com/XingangPan/DragGAN) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/XingangPan/DragGAN?style=social)|Free|
| visual-chatgpt | Create images with ChatGPT | [GitHub](https://github.com/microsoft/visual-chatgpt) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/visual-chatgpt?style=social)| Free |
| Microsoft Bing Image Creator | Image Creator is a tool for creating pictures using DALL-E technology. Tried **Generating portrait pictures is unsightly** | [URL](https://www.bing.com/images/create) | Free|
| remove.bg |Remove Image Background|[URL](https://www.remove.bg/)|Free/Paid|
| ControlNet |ControlNet is a neural network structure to control diffusion models by adding extra conditions.|[Github](https://github.com/lllyasviel/ControlNet) ![GitHub Repo stars](https://img.shields.io/github/stars/lllyasviel/ControlNet?style=social)|Free|
|StreamDiffusion| A Pipeline-Level Solution for Real-Time Interactive Generation|[Github](https://github.com/cumulo-autumn/StreamDiffusion) ![GitHub Repo stars](https://img.shields.io/github/stars/cumulo-autumn/StreamDiffusion?style=social)|Free|

### LLM Prompts
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- |
|f/awesome-chatgpt-prompts|This repo includes ChatGPT prompt curation to use ChatGPT better.|[Github](https://github.com/f/awesome-chatgpt-prompts) ![GitHub Repo stars](https://img.shields.io/github/stars/f/awesome-chatgpt-prompts?style=social) |Free|

### LLM Leaderboard
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- |
|LMSYS Chatbot Arena Leaderboard|LMSYS Chatbot Arena is a crowdsourced open platform for LLM evals. Collected over 1,000,000 human pairwise comparisons to rank LLMs with the Bradley-Terry model and display the model ratings in Elo-scale. |[URL](https://chat.lmsys.org/) |Free|
|Artificial Analysis|Artificial Analysis is a platform that provides AI model and service provider comparisons and benchmarks to help users make informed decisions when choosing AI models and service providers. The platform provides comparative data on a wide range of popular AI models, including OpenAI's GPT-4, Meta's Llama 3, and Anthropic's Claude series, covering performance metrics such as response time, latency, and cost.|[URL](https://artificialanalysis.ai/)|Free|

### LLM training platform
| Name | Description | Links | Fees |
| ---- | ----------------------------- | --- | --- |
| lm-sys/FastChat | An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. | [Github](https://github.com/lm-sys/FastChat) ![GitHub Repo stars](https://img.shields.io/github/stars/lm-sys/FastChat?style=social)| Free |

### Applications that integrate multiple LLMs
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | ---- |
| chathub | Use different chatbots in one app, currently supporting ChatGPT, new Bing Chat, Google Bard, Claude, and 10+ open-source models including Alpaca, Vicuna, ChatGLM etc. | [GitHub](https://github.com/chathub-dev/chathub) </br>![GitHub Repo stars](https://img.shields.io/github/stars/chathub-dev/chathub?style=social)|Free/Paid|
| ChatALL | Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vicuna, Claude, ChatGLM, MOSS, and more, discover the best answers| [GitHub](https://github.com/sunner/ChatALL)  </br> ![GitHub Repo stars](https://img.shields.io/github/stars/sunner/ChatALL?style=social)|Free|
| Harbor | Effortlessly run LLM backends, APIs, frontends, and services with one command. | [GitHub](https://github.com/av/harbor) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/av/harbor?style=social)|  Free |

### AI Agent
| Name | Description | Links | Fees |  
| ---- | ----------------------------- | --- | --- |
|Auto-GPT|Open source, An experimental open-source attempt to make GPT-4 fully autonomous.|[GitHub](https://github.com/Torantulino/Auto-GPT) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/Torantulino/Auto-GPT?style=social)|Free|
|OthersideAI/self-operating-computer|A framework to enable multimodal models to operate a computer.|[Github](https://github.com/OthersideAI/self-operating-computer) ![GitHub Repo stars](https://img.shields.io/github/stars/OthersideAI/self-operating-computer?style=social)|Free，GPT-4v required|
|AppAgent|Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.|[Github](https://github.com/mnotgod96/AppAgent) ![GitHub Repo stars](https://img.shields.io/github/stars/mnotgod96/AppAgent?style=social)|Free|
|microsoft/autogen|AutoGen is an open-source programming framework for building AI agents and facilitating cooperation among multiple agents to solve tasks. |[Github](https://github.com/microsoft/autogen) ![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/autogen?style=social)|Free|
|potpie-ai/potpie|Open Source AI Agents for your codebase in minutes. Use pre-built agents for Q&A, Testing, Debugging and System Design or create your own purpose-built agents. |[URL](https://potpie.ai) , [Github](https://github.com/potpie-ai/potpie) ![GitHub Repo stars](https://img.shields.io/github/stars/potpie-ai/potpie?style=social)|Free Trial|

### Writing
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- |
| Notion AI | AI-assisted note-taking software | [URL](https://www.notion.so)| with certain free AI trials, AI features $10/month |
| Deep L Write | English and German writing tools to fix writing errors and rewrite sentences promptly. | [URL](https://www.deepl.com/write) | Free version to use with text word limit / paid upgrade available |
| grammarly | Edit and correct your grammar, spelling, punctuation, and more with your personal writing assistant, grammar checker, and editor.| [URL](https://app.grammarly.com/) | Free/Paid|

### Programming Development
| Name | Description | Links | Fees |  
| ---- | ----------------------------- | --- | --- | 
| GitHub Copilot | A code writing assistant developed by GitHub and OpenAI | [URL](https://github.com/features/copilot) | Paid
| Cursor | A collaborative code editor using GPT | [URL](https://www.cursor.so) | Paid/Free Trial |
| MarsCode |Built-in AI programming assistant with capabilities like code completion, explanation, and debugging for faster development.|[URL](https://www.marscode.com/)|Free|
| ai-code-translator | Open source project. Translates code from one language to another using chatgpt. | [GitHub](https://github.com/mckaywrigley/ai-code-translator) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/mckaywrigley/ai-code-translator?style=social)| Free, requires OpenAI API key|
| Amazon CodeWhisperer | A code writing assistant developed by Amazon| [URL](https://aws.amazon.com/cn/codewhisperer)| Free for Individual Use|
|gpt-engineer|GPT Engineer is made to be easy to adapt, extend, and make your agent learn how you want your code to look. It generates an entire codebase based on a prompt.|[GitHub](https://github.com/AntonOsika/gpt-engineer) ![GitHub Repo stars](https://img.shields.io/github/stars/AntonOsika/gpt-engineer?style=social)|Free|
| Codeium | Powerful in-IDE AI coding assistant|[URL](https://codeium.com/)|Free/Paid|
| scalene |Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals|[Github](https://github.com/plasma-umass/scalene) </br>![GitHub Repo stars](https://img.shields.io/github/stars/plasma-umass/scalene?style=social)|Free|
| Fitten Code | Fitten Code is an AI programming assistant driven by Fitten LLM models, which can automatically generate code, improve development efficiency, help you debug, and save your time. It can also chat with you and solve your programming problems.freeand supports over 80 languages: Python, C++,JavaScript, TypeScript, Java, etc. Fitten Code supports Visual Studio Code and JetBrains series IDEs, including IntelliJ IDEA, PyCharm, WebStorm, etc.|[URL](https://code.fittentech.com/en?lang=en)| Free |
|flappy|Production-Ready LLM Agent SDK for Every Developer|[GitHub](https://github.com/pleisto/flappy) ![GitHub Repo stars](https://img.shields.io/github/stars/pleisto/flappy.svg?style=social) |Free|
| Plandex | Open source, terminal-based AI programming engine for complex tasks | [GitHub](https://github.com/plandex-ai/plandex) ![GitHub Repo stars](https://img.shields.io/github/stars/plandex-ai/plandex?style=social)| Free |
| Mistral/Codestral|[Empowering developers and democratising coding with Mistral AI.](https://mistral.ai/news/codestral/), models:https://huggingface.co/mistralai/Codestral-22B-v0.1|[URL](https://chat.mistral.ai/chat)|Free|

### Translation
| Name | Description | Links | Fees |  
| ---- | ----------------------------- | --- | --- |
| immersive-translate | Open source project. Immersive bilingual web translation extension | [GitHub](https://github.com/immersive-translate/immersive-translate/) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/immersive-translate/immersive-translate?style=social)| Free |
| Deep L | Accurate and instant translation tool, currently supporting 31 languages | [URL](https://www.deepl.com/translator) | Free/Paid
| openai-translator | Open source project. Crossword translation browser plugin and cross-platform desktop application based on ChatGPT API | [GitHub](https://github.com/yetone/openai-translator) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/yetone/openai-translator?style=social)| Free, requires OpenAI API key |

### AI Conversation or AI Voice Conversation

| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- |
| pi.ai | An AI that's been shown to be very good at chatting, so you don't have to worry about talking all day. It supports both text and speech. Voice input is required with Apple's input system. Good for practicing English conversation and listening.| [URL](https://pi.ai/) | Free|
|Voice Control for ChatGPT | This Chrome extension allows you to have voice conversations with ChatGPT. | [URL](https://chrome.google.com/webstore/detail/voice-control-for-chatgpt/eollffkcakegifhacjnlnegohfdlidhn) | Free, requires chatgpt account  | 
|SpeechGPT|SpeechGPT is a web application that enables you to converse with ChatGPT.|[GitHub](https://github.com/hahahumble/speechgpt) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/hahahumble/speechgpt?style=social)|Free，requires OpenAI API key|


### Speech Recognition
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- |
| whisper | OpenAPI open source robust speech recognition model through large-scale weak supervision | [GitHub](https://github.com/openai/whisper) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/openai/whisper?style=social)| Free |
| buzz | An open source desktop software based on OpenAI's Whisper to recognize speech and generate subtitles | [GitHub](https://github.com/chidiwilliams/buzz) </br> ![GitHub Repo stars](https://img.shields.io/github/stars/chidiwilliams/buzz?style=social)| Free |
| WhisperDesktop| Open source, OpenAI-based Whisper, a desktop application for Windows, uses the GPU for processing, which will be faster than on the CPU with good GPU performance.|[GitHub](https://github.com/Const-me/Whisper) ![GitHub Repo stars](https://img.shields.io/github/stars/Const-me/Whisper?style=social)|Free|
| whisperX | WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)| [whisperX](https://github.com/m-bain/whisperX) ![GitHub Repo stars](https://img.shields.io/github/stars/m-bain/whisperX?style=social) |Free|
| whisper-web | ML-powered speech recognition directly in your browser. Built with [Transformers.js](https://github.com/xenova/transformers.js). [Demo](https://huggingface.co/spaces/Xenova/whisper-web) | [GitHub](https://github.com/xenova/whisper-web) ![GitHub Repo stars](https://img.shields.io/github/stars/xenova/whisper-web?style=social)|Free|

### Text To Speech
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- | 
| Azure Text to speech| The best and most realistic voice tools currently available| [URL](https://speech.microsoft.com/portal/voicegallery) |Paid / 500,000 characters per month free|
| coqui-ai/tts | A deep learning toolkit for Text-to-Speech, battle-tested in research and production <br> Online Demo: https://huggingface.co/spaces/coqui/xtts| [Github](https://github.com/coqui-ai/tts) ![GitHub Repo stars](https://img.shields.io/github/stars/coqui-ai/tts?style=social) | Free|
| elevenlabs | Intelligent AI Text to Speech |[URL](https://elevenlabs.io/)|Free/Paid|
| netease-youdao/EmotiVoice | A Multi-Voice and Prompt-Controlled TTS Engine. EmotiVoice speaks both English and Chinese, and with over 2000 different voices. The most prominent feature is emotional synthesis, allowing you to create speech with a wide range of emotions, including happy, excited, sad, angry and others.|[Github](https://github.com/netease-youdao/EmotiVoice) ![GitHub Repo stars](https://img.shields.io/github/stars/netease-youdao/EmotiVoice?style=social)| Free|
| tetos |A unified interface for multiple Text-to-Speech (TTS) providers. Supported TTS providers: Edge TTS, OpenAI TTS, Azure TTS, Google TTS, Volcengine TTS, Baidu TTS|[Github](https://github.com/frostming/tetos) ![GitHub Repo stars](https://img.shields.io/github/stars/frostming/tetos?style=social)|Free|
| ChatTTS |ChatTTS is a text-to-speech model designed specifically for dialogue scenario such as LLM assistant. It supports both English and Chinese languages. Our model is trained with 100,000+ hours composed of chinese and english. Website：https://chattts.com/|[Github](https://github.com/2noise/ChatTTS)![GitHub Repo stars](https://img.shields.io/github/stars/2noise/ChatTTS?style=social)|Free|

### Music Recognition
| Name | Description | Links | Fee | 
| ---- | ----------------------------- | --- | --- |
|shazam| Download the shazaom app for music recognition, which is pretty fast |[URL](https://www.shazam.com/)| Free|

### Voice Processing
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- | 
|so-vits-svc| SoftVC VITS Singing Voice Conversion.|[GitHub](https://github.com/svc-develop-team/so-vits-svc) ![GitHub Repo stars](https://img.shields.io/github/stars/svc-develop-team/so-vits-svc?style=social)|Free|
|vocalremover| Extract vocal and music|[URL](https://vocalremover.org/)|Free|
|lala.ai|Extract vocal, accompaniment and various instruments from any audio and video|[URL](https://www.lalal.ai/)|Free/Paid|

### AI generated music or sound effects
| Name | Description | Link | Fees |
| ---- | -------------------------- | --- | --- |
|suno.ai|The AI music creation tool Suno can generate custom songs based on text prompts in mere second [You can create your own AI songs with this new Copilot extension](https://www.theverge.com/2023/12/19/24008279/microsoft-copilot-suno-ai-music-generator-extension)|[URL](https://www.suno.ai/)||Free/Paid|
|udio|Create music from simple text prompts by specifying topics, genres, and other descriptors which are then transformed into professional quality tracks.|[URL](https://www.udio.com/)||
|elevenlabs/sound-effects|Imagine a sound and bring it to life, or explore a selection of the best sound effects generated by the community.|[URL](https://elevenlabs.io/app/sound-effects)|Free|
|suno-ai/bark|Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects.|[Github](https://github.com/suno-ai/bark) ![GitHub Repo stars](https://img.shields.io/github/stars/suno-ai/bark?style=social)|Free|
|audiocraft|Open source library for audio/music generation by Meta, which mainly includes two models, MusicGen: text-to-music model, AudioGen: text-generated sound model. [MusicGen Online Demo](https://huggingface.co/spaces/facebook/MusicGen)|[GitHub](https://github.com/facebookresearch/audiocraft)  </br> ![GitHub Repo stars](https://img.shields.io/github/stars/facebookresearch/audiocraft?style=social)|Free|
|Stable Audio|AI music and sound effect generation application by stability.ai|[URL](https://www.stableaudio.com/)|Free/Paid|
|OptimizerAI|Sound effect generation <br>[Official Introduction](https://twitter.com/OptimizerAI/status/1779881263358419243)|[URL](https://www.optimizerai.xyz/) |Free/Paid|
|SFX Engine|AI Sound effect generation |[URL](https://sfxengine.com/) |Free/Paid|

### Speech translation
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- | 
| Seamless |Seamless is a family of AI models that enable more natural and authentic communication across languages.[Online Demo](https://seamless.metademolab.com/expressive?utm_source=metaai&utm_medium=web&utm_campaign=fair10&utm_content=blog)|[Github](https://github.com/facebookresearch/seamless_communication) ![GitHub Repo stars](https://img.shields.io/github/stars/facebookresearch/seamless_communication?style=social)|Free|

### Video Creation

| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- |
| KLING AI|AI Video Creation Tool by kuaishou.  |[URL](https://klingai.com/)|Free/Paid|
| Dream Machine|By Luma AI. Dream Machine is an AI model that makes high quality, realistic videos fast from text and images.[Official introductory video](https://www.youtube.com/watch?v=Zb3tffmBPRE)|[URL](https://lumalabs.ai/dream-machine)|Free/Paid|
| Sora | Sora is an AI model published by OpenAI that can create realistic and imaginative scenes from text instructions. Sora access not fully open, some visual artists, designers and filmmakers given access | [URL](https://openai.com/sora) | - |
| capcut | Subtitle-generated speech, speech recognition, and very convenient and powerful video editing|[URL](https://www.capcut.com/)|Free/Paid|
| Runway | Gen-2: Text/Image to video <br> Gen-1: Video to video. Featured video: https://runwayml.com/staff-picks | [URL](https://runwayml.com/) | Paid/Free trial|
| Pika | Text/Image to video |[URL](https://pika.art/home)|Paid/Free trial|
| Fliki | A website that converts text into audio and video | [URL](https://fliki.ai) | Free/Paid |
| d-id | Generate digital human dubbing video based on text | [URL](https://studio.d-id.com) | Paid/Free trial|
| HeyGen | Generate digital human dubbing video based on text | [URL](https://app.heygen.com/) | Paid/Free trial|
| AnimateDiff |  AnimateDiff is a plug-and-play module turning most community models into animation generators, without the need of additional training.| [Github](https://github.com/guoyww/AnimateDiff) ![GitHub Repo stars](https://img.shields.io/github/stars/guoyww/AnimateDiff?style=social)|Free|
|vivago.ai/video|Text to Video; Image to Video; 4K enhance|[URL](https://vivago.ai/video)|Free|


### Video Content Summary
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- |
| ChatGPT for YouTube | Chrome plugin, quickly summarize Youtube video content, need to log in chatgpt account or apikey | [URL](https://chatgpt4youtube.com/)| Free |
| Chat Youtube | Give a Youtube link, it will give a summary, and you can ask it questions about the content of the video |[URL](https://chatyoutube.com) | Free |

### OCR
| Name | Description | Links | Fees | 
| ---- | ----------------------------- | --- | --- |
|Umi-OCR|Comes with a highly efficient offline OCR engine. As long as the computer performance is sufficient, it can be faster than online OCR services.|[Github](https://github.com/hiroi-sora/Umi-OCR) ![GitHub Repo stars](https://img.shields.io/github/stars/hiroi-sora/Umi-OCR?style=social)|Free|


[![Star History](https://api.star-history.com/svg?repos=ikaijua/Awesome-AITools&type=Date)](https://star-history.com/#ikaijua/Awesome-AITools&Date)

Awesome-AITools Discord Link: https://discord.gg/7hAvJQME

