Frequently Asked Questions about Qsirch On-Premises LLM
Applicable Products
Qsirch 6.0.0
Applied Firmware:
- QTS 5.1.0 or above
- QuTS hero 5.1.0 or above
- QuTScloud 5.1.0 or above
Overview
Qsirch is a powerful search engine designed exclusively for QNAP NAS, enabling users to quickly locate files and information.
Qsirch 6.0 introduces support for on-premises LLM models with RAG Search and RAG multi-turn conversations, delivering smarter, context-aware search capabilities while keeping your data secure and private.
FAQs
Q1: Does Qsirch on-prem LLM send data outside?
No. All data processing, analysis, and LLM inference are performed locally within the NAS system. No content is uploaded or transmitted externally, ensuring data privacy and security.
Q2: Can I use on-prem LLM if my NAS does not have a GPU?
No. On-prem LLM inference requires GPU computation. Without a GPU, you can connect to a cloud LLM via API for RAG search and still experience AI-powered searches.
Q3: Can on-prem LLM and cloud LLM be used together?
Yes. If your NAS meets the hardware requirements for on-prem LLM and the model is downloaded, and you also connect to a cloud LLM via API, you can freely switch between model sources during RAG search.
Q4: How much NAS storage space is required for model deployment?
It depends on the model size. LLM model files typically range from several to tens of GBs. It is recommended to store models on SSDs to reduce loading and inference latency.
Q5: What is RAG multi-turn conversation?
RAG multi-turn conversation allows the AI to retain the current conversation context and provide follow-up analysis and responses based on previous search results, without requiring users to re-enter the full query each time.
Q6: How long is multi-turn conversation history retained?
The system retains a certain number of multi-turn conversation histories for future references and searches. When the storage limit is exceeded, older records are automatically deleted based on the last modified date to ensure performance and a smooth user experience.
Q7: Does multi-turn conversation significantly impact performance?
The more conversations and the longer the context, the more system resources are required. Under typical usage scenarios, the impact is minimal. However, if handling a large number of files or multiple searches simultaneously, response times may slow down. We recommend using multi-turn conversation when sufficient hardware resources (especially GPU/VRAM) are available to ensure the best experience.
Q8: Can I switch the search scope in RAG search?
Yes. The default search scope is "Global Search." If "Specified Folder Search" is selected, you must choose at least 1 folder and can select up to 50 folders.
Q9: Can on-prem LLM be shared across multiple NAS devices?
No. Each NAS must independently deploy and download the model. Model files cannot be directly shared between NAS devices.
Q10: Are model updates performed automatically?
Cloud models are updated alongside Qsirch version updates. On-prem models are updated with the LLM Core on a regular basis. Users must download the new version of Qsirch or LLM Core to avoid performance or compatibility issues due to version changes.
Q11: Can encrypted folders be included as data sources for RAG search?
Yes, but encrypted folders must be unlocked before searching; otherwise, the system cannot access their contents. Any folder accessible within Qsirch can be used as a source for RAG search.
Q12: Does using API integration with cloud LLM require additional costs?
Yes. Cloud LLM services (such as OpenAI, Google Gemini, etc.) are billed according to the provider's API pricing policy. QNAP does not charge additional fees for this integration.
Q13: Why can't the on-prem LLM be started? Is it related to GPU VRAM?
Yes. If the size of the on-prem model exceeds the available GPU memory (VRAM), the system will not be able to load the model successfully, which prevents the feature from starting. To ensure proper operation, check whether your GPU VRAM is sufficient for the selected model. If resources are insufficient, consider switching to a smaller model or upgrading your GPU hardware.