Product Matching using Sentence-BERT: A Deep Learning Approach to E-Commerce Product Deduplication
Journal Title: Engineering and Technology Journal - Year 2024, Vol 9, Issue 12
Abstract
Product matching in e-commerce platforms presents a significant challenge due to variations in product titles, descriptions, and categorizations across different vendors. This paper presents a lightweight yet effective approach to product matching using Sentence-BERT (SBERT), specifically the all-MiniLM-L6-v2 variant. Our method combines efficient text preprocessing, strategic training pair generation, and threshold-based similarity matching to achieve high-accuracy product matching while maintaining computational efficiency. The system was evaluated on the Pricerunner dataset, achieving exceptional results with 98.10% accuracy, 100% precision, and 91.84% recall. The implementation includes a modular architecture that facilitates maintenance and updates, while the threshold-based matching strategy allows fine-tuned control over precision-recall trade-offs. Our results suggest that carefully designed preprocessing and training strategies, combined with lightweight transformer models, can achieve state-of-the-art performance in product matching without requiring complex model architectures or extensive computational resources.
Authors and Affiliations
Heribertus Yulianton , Rina Candra Noor Santi,
APPLYING PRECISION INTO PROACTIVE MAINTENANCE IN NIGERIA ELECTRIC POWER INDUSTRY
Power station plants and equipments are required to run for well beyond their intended lifetime. Opening up machines for inspections is expensive and owners need to consider all relevant information in making the decisio...
COMPUTER-AIDED MEDICAL DIAGNOSIS SYSTEM USING LOGISTICS REGRESSION ALGORITHMS (LRA) SUPERVISED LEARNING APPROACH
This work focused on the designing of medical diagnosis system using Supervised Machine Learning. Logistics Regression Algorithms (LRA) was adopted, the label inputs for the data set which the symptoms were trained and m...
Improving Results of TF-IDF based Retrieval System using Co-reference Resolution and Pronoun Substitution
Information Retrieval systems involve the process of retrieving relevant information based on user queries. TF-IDF is one of the most popular techniques of Information Retrieval. It is widely used and been successful in...
Design and Implementation of Portable Low-Cost Heart Rate Monitoring ECG System
Electrocardiogram is an important health parameter in the diagnosis of cardiovascular diseases, which are among the leading cause of death in the world including Nigeria. This research work “portable low-cost heart rate...
Investigating the Impact of Ground-Return Parameters on Transitional Voltages at Switching-Off Unloaded Power Transmission Lines
The electrical parameters calculations of the overhead transmission lines are very important to the areas concerned to electromagnetic compatibility and transitional processes in power systems. Therefore, accurate calcul...