Say you have proprietary data in txt, pdf, doc, xls, etc, and you want to create a chatbot to learn it and summarize it for you, why dont you just download llama or Mistral or smth and write some python script to feed the data using langchain or smth? I see ppl telling me that they gotta re-train or fine tune the llm with the proprietary data. That sounds like an overkill TC 160
Also, converting pdf to something like markdown for searchability is a huge problem on its own (when I last checked some months ago)
OP is mixing up concept. Reason below: 1. Limited context window. Proprietary data is huge. Canโt shove the entire proprietary database into the context window. 2. Alternatively you can convert proprietary data into vector base. Many companies do it already. Itโs called RAG
Ahh so that's what rag is. Keep hearing the phrase a lot, thx. Gonna look into it