KnowFlow
KnowFlow is a powerful hybrid Retrieval-Augmented Generation (RAG) system that combines semantic search with knowledge graph capabilities for intelligent document processing and querying.
🌟 Features
-
Advanced Document Processing
- Multi-format support (PDF, DOCX, CSV, TXT)
- Intelligent chunking with configurable size and overlap
- Parallel batch processing with S3 storage
- Document status tracking (PENDING, PROCESSING, INDEXED, FAILED)
- Secure per-user document isolation
-
Hybrid RAG + Knowledge Graph Architecture
- Dense semantic embeddings via Google Gemini + pgvector
- Structured knowledge extraction to Neo4j
- Multi-hop reasoning through graph relationships
- Automatic entity and relationship mapping
- Query decomposition for complex questions
-
Smart Query Processing
- Automatic query decomposition for complex questions
- Hybrid vector + graph-based retrieval
- Retrieval quality evaluation and improvement
- Context-aware response synthesis
- Conversation memory with graph context
-
Chat & Session Management
- Persistent chat sessions with history
- Context-aware follow-up questions
- Session renaming and management
- Message tracking with context preservation
- Multi-user support with isolation
-
Security & Authentication
- JWT-based authentication
- Secure password hashing with bcrypt
- Role-based access control
- Per-user data isolation
- Document access verification
-
Storage & Infrastructure
- S3-compatible object storage
- PostgreSQL for structured data
- Neo4j for graph relationships
- Concurrent file operations
- Efficient batch processing
🏗️ Architecture
🚀 Quick Start
Prerequisites
- Python 3.8+
- PostgreSQL 14+ with pgvector extension
- Neo4j 5.0+
- S3-compatible storage
- Google Cloud API key for Gemini
Environment Variables
- Start the development server:
🔒 Security Features
- JWT-based authentication with expiration
- Bcrypt password hashing
- Per-user document isolation
- Access control verification
- Secure file storage paths
- Input validation and sanitization
📊 Monitoring & Logging
- Structured logging with levels
- Request/response tracking
- Error handling and reporting
- Performance metrics
- Document processing status
- Chat session analytics
🤝 Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
📄 License
This project is licensed under the terms of the LICENSE file included in the repository.