Benchmarking Text Embedding Models for Multi-Dataset Semantic Textual Similarity: A Machine Learning-Based Evaluation Framework

Apply

Benchmarking Text Embedding Models for Multi-Dataset Semantic Textual Similarity: A Machine Learning-Based Evaluation Framework

Journal Title: Acadlore Transactions on AI and Machine Learning - Year 2025, Vol 4, Issue 2

Abstract

The selection of optimal text embedding models remains a critical challenge in semantic textual similarity (STS) tasks, particularly when performance varies substantially across datasets. In this study, the comparative effectiveness of multiple state-of-the-art embedding models was systematically evaluated using a benchmarking framework based on established machine learning techniques. A range of embedding architectures was examined across diverse STS datasets, with similarity computations performed using Euclidean distance, cosine similarity, and Manhattan distance metrics. Performance evaluation was conducted through Pearson and Spearman correlation coefficients to ensure robust and interpretable assessments. The results revealed that GIST-Embedding-v0 consistently achieved the highest average correlation scores across all datasets, indicating strong generalizability. Nevertheless, MUG-B-1.6 demonstrated superior performance on datasets 2, 6, and 7, while UAE-Large-V1 outperformed other models on datasets 3 and 5, thereby underscoring the influence of dataset-specific characteristics on embedding model efficacy. These findings highlight the importance of adopting a dataset-aware approach in embedding model selection for STS tasks, rather than relying on a single universal model. Moreover, the observed performance divergence suggests that embedding architectures may encode semantic relationships differently depending on domain-specific linguistic features. By providing a detailed evaluation of model behavior across varied datasets, this study offers a methodological foundation for embedding selection in downstream NLP applications. The implications of this research extend to the development of more reliable, scalable, and context-sensitive STS systems, where model performance can be optimized based on empirical evidence rather than heuristics. These insights are expected to inform future investigations on embedding adaptation, hybrid model integration, and meta-learning strategies for semantic similarity tasks.

Authors and Affiliations

Sutriawan, Wasis Haryo Sasoko, Zumhur Alamin, Ritzkal

Keywords

Machine learning models; Multi-dataset; Semantic textual similarity (STS); Massive text embedding benchmark (MTEB)

A Novel Machine Learning Approach for Optimizing Radar Warning Receiver Preprogramming

Radar warning receivers (RWRs) are critical for swiftly and accurately identifying potential threats in complex electromagnetic environments. Numerous methods have been developed over the years, with recent advances in a...

Information Acquisition Method of Tomato Plug Seedlings Based on Cycle-Consistent Adversarial Network

In order to solve the interference caused by the overlapping and extrusion of adjacent plug seedlings, accurately obtain the information of tomato plug seedlings, and improve the transplanting effect of automatic tomato...

A Comprehensive Review of Ant Colony Optimization in Swarm Intelligence for Complex Problem Solving

Swarm intelligence (SI) has emerged as a transformative approach in solving complex optimization problems by drawing inspiration from collective behaviors observed in nature, particularly among social animals and insects...

Floor Segmentation Approach Using FCM and CNN

Floor plans play an essential role in the architecture design and construction, which serves as an important communication tool between engineers, architects and clients. Automatic identification of various design elemen...

Advanced Tanning Detection Through Image Processing and Computer Vision

This study introduces an advanced approach to the automated detection of skin tanning, leveraging image processing and computer vision techniques to accurately assess tanning levels. A method was proposed in which skin t...

EP ID EP767827
DOI https://doi.org/10.56578/ataiml040202
Views 16
Downloads 0

How To Cite

Sutriawan, Wasis Haryo Sasoko, Zumhur Alamin, Ritzkal (2025). Benchmarking Text Embedding Models for Multi-Dataset Semantic Textual Similarity: A Machine Learning-Based Evaluation Framework. Acadlore Transactions on AI and Machine Learning, 4(2), -. https://europub.co.uk/articles/-A-767827