ComfyUI Extension: ComfyUI-String-Similarity
ComfyUI String Similarity Node - Advanced text comparison with multiple algorithms
Custom Nodes (0)
README
ComfyUI String Similarity Node
A comprehensive text comparison node for ComfyUI that provides multiple algorithms to measure string similarity, useful for OCR validation, text comparison, and quality assessment workflows.
Features
- Multiple similarity algorithms in one node
- Detailed comparison metrics
- Support for both character-level and semantic similarity
- Useful for OCR accuracy assessment
- Compatible with ComfyUI's text processing pipeline
Supported Algorithms
- Levenshtein Distance: Edit distance between strings
- SequenceMatcher: Python's built-in sequence matching
- Jaccard Similarity: Set-based word comparison
- Cosine Similarity: Vector space model comparison
- Word Error Rate (WER): Word-level accuracy metric
- Character Error Rate (CER): Character-level accuracy metric
- SentenceTransformer-MPNET: Deep learning semantic similarity
- SentenceTransformer-MiniLM: Lightweight semantic similarity
Installation
- Clone this repository into your ComfyUI custom_nodes folder:
cd ComfyUI/custom_nodes
git clone https://github.com/yourusername/ComfyUI-String-Similarity
- Install the required dependencies:
pip install -r requirements.txt
- Restart ComfyUI
Usage
- Add the "String Similarity Node" to your workflow
- Connect two text inputs (actual_text and ocr_text)
- Select the comparison algorithm
- The node outputs a formatted string with similarity metrics
Input Parameters
- actual_text: The reference or ground truth text
- ocr_text: The text to compare (e.g., OCR output)
- algorithm: Choose from 8 different similarity algorithms
Output
- STRING: Formatted result showing the similarity score and additional metrics
Algorithm Details
Distance-Based Metrics
- Levenshtein: Minimum edits needed to transform one string to another
- SequenceMatcher: Ratio of matching subsequences
- CER/WER: Error rates at character and word levels
Similarity Metrics
- Jaccard: Intersection over union of word sets
- Cosine: Angle between text vectors
Semantic Similarity
- SentenceTransformers: Neural models that understand meaning
- MPNET: More accurate but slower
- MiniLM: Faster with good accuracy
Use Cases
- OCR Quality Assessment: Compare OCR output with ground truth
- Text Validation: Check if generated text matches expected output
- Duplicate Detection: Find similar text passages
- Content Matching: Semantic similarity for paraphrases
Example Output
Actual Text: 'Hello World'
OCR Text: 'Helo World'
Levenshtein Distance: 1
Levenshtein Similarity: 0.91
Performance Notes
- Simple algorithms (Levenshtein, Jaccard) are fast
- SentenceTransformers require model downloading on first use
- Models are cached after initial download
Troubleshooting
- Import errors: Ensure all requirements are installed
- Model download fails: Check internet connection
- Memory issues: Use MiniLM for lower memory usage
Requirements
- ComfyUI
- Python packages: Levenshtein, scikit-learn, numpy, sentence-transformers
License
MIT License
Copyright (c) 2024
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.