Metrics and similarities in modeling dependencies between continuous and nominal data
Journal Title: Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki - Year 2013, Vol 7, Issue 10
Abstract
Classification theory analytical paradigm investigates continuous data only. When we deal with a mix of continuous and nominal attributes in data records, difficulties emerge. Usually, the analytical paradigm treats nominal attributes as continuous ones via numerical coding of nominal values (often a bit ad hoc). We propose a way of keeping nominal values within analytical paradigm with no pretending that nominal values are continuous. The core idea is that the information hidden in nominal values influences on metric (or on similarity function) between records of continuous and nominal data. Adaptation finds relevant parameters which influence metric between data records. Our approach works well for classifier induction algorithms where metric or similarity is generic, for instance k nearest neighbor algorithm or proposed here support of decision tree induction by similarity function between data. The k-nn algorithm working with continuous and nominal data behaves considerably better, when nominal values are processed by our approach. Algorithms of analytical paradigm using linear and probability machinery, like discriminant adaptive nearest-neighbor or Fisher’s linear discriminant analysis, cause some difficulties. We propose some possible ways to overcome these obstacles for adaptive nearest neighbor algorithm.
Authors and Affiliations
Michał Grabowski
New Interpretation of Principal Components Analysis
A new look on the principal component analysis has been presented. Firstly, a geometric interpretation of determination coefficient was shown. In turn, the ability to represent the analyzed data and their interdependenci...
Wybrane zagadnienia bezpieczeństwa danych w sieciach komputerowych
Bezpieczeństwo danych przesyłanych w sieciach komputerowych jest jednym z najważniejszych zadań współczesnej teleinformatyki. W artykule przedstawiono podstawowe rodzaje złośliwego oprogramowania oraz przykładowe metody...
Algorytmy konstrukcyjne dla problemu harmonogramowania projektu z ograniczonymi zasobami
W artykule opisany jest problem harmonogramowania projektu z ograniczoną dostępnością zasobami z kryterium minimalizacji czasu trwania projektu. Do rozwiązania zagadnienia opracowane są algorytmy konstrukcyjne, które mog...
Optimization of the enterprise marketing strategy using the operations research
It is examined an approach to the analytical tools construction of marketing of the enterprise designed to select the optimal assortment, sales volume, market segments and product prices and based on the use of the opera...
Algorithms Using List Scheduling and Greedy Strategies for Scheduling in the Flowshop with Resource Constraints
The paper addresses the problem of scheduling in the two-stage flowshop with parallel unrelated machines and renewable resource constraints. The objective is minimization of makespan. The problem is NP-hard. Fast heurist...