machine-learning

Cost-sensitive Statistical Relational Learning

In this project, we consider the problem of incorporating the domain knowledge on different weights of positive samples and negative samples. One of the motivations is the class-imbalance situation in many relational domains where the classifier boundary could be easily dominated by the majority class and overfitting on its outliers. Hence, it is essential to steer the training process toward focusing more on the minority class by assigning different costs on false positive and false negative samples. Besides the requirement enforced by such data properties, there are also practical demands in certain domains, such as the diagnosis problem in medical domains, the quality checking in manufacturing data, the recommendation prediction in recommender systems, etc.

Knowledge-intensive Learning

In many domains where there are considerable amount of factors influencing the target variable, the dimension of the parameter space for probabilistic models is exponential in the number of variables, which would require significant amount of training samples to guarantee a reasonable prediction accuracy. For this project, we proposed a way to incorporate the domain knowledge on the independence of causal influence and qualitative constraints which greatly improves the prediction performance by reducing the dimension of feature space as well as constraining the searching space.

Sequence Data Mining

In most realistic domains, the variables transit between its possible states over time. The data is generated by the dynamic processes with multiple observations at different time points. Dynamic models are needed for modeling such transition intensities over time.