SWC:Imprecise Learning

The goals of this project are:
 * to derive an imprecise transaction model for blog articles using fuzzy sets
 * to use/improve any of existent association rules mining algorithms to work with our transaction model
 * to develop an execution model for the mined association rules:
 * given a goal (predicate) to derive the minimum set of transactions to be in the database such as the goal is true (argumentation) -- typically achieved by a backward chaining engine such as Prolog
 * given a set of transactions to derive the set of all consequences -- typically achieved by a forward chaining engine such as Jess computing transitive closure

Fuzzy sets are predefined
We assume that a human expert designed specific fuzzy sets by using natural language terms (the tags) together with their membership function. For example two imprecise concepts (modelled as fuzzy sets) $F$ (food) and $C$ (colours):

$F = (U_F, \mu_F)$ where $U_F=\{ potatoes, meat, beef, spicy, hot, orange, apple, ... \}$ and $\mu_F$ is the membership function, i.e., $\mu_F(potatoes)=0.95$, $\mu_F(meat)=0.99$, $\mu_F(orange)=0.65$, ...

$C = (U_C, \mu_F)$ where $U_C=\{ orange, green, blue, yellow,, ... \}$ and $\mu_C$ is the membership function, i.e., $\mu_C(orange)=1.00$, $\mu_F(green)=0.85$, $\mu_F(orange)=0.75$, ...

Fixed Database Schema versus No Schema
When mining association rules the database of transactions is supposed to have a fixed schema e.g., .

This case is concerned with a non-fixed schema, transactions modelling and the rule miner would work on one schema or another, e.g.,   or.

Building a transaction set from real data
Suppose we have a set of blog articles modelled as: and lets assume we are interested in the schema

OPEN QUESTION

How to build a "traditional" transactions fuzzy database (fixed schema) on top of which a rule miner will be able to use an algorithm such as U-Apriori or with a modified version?

Notice that U-Apriori (why would not do an U-FP-Growth ) executes on top of a "fixed schema transaction set".

Sample Comments:

Assuming the data if we are interested in the schema a possible "transaction set" would be:

In the  column I removed   as it does not belongs to   ($\mu_F(old)=0.0$) while in the second column   and   do not belong to.

in a non crisp case  and    would be the distance of membership of the subsets to their concepts e.g., $x = \mu (\{orange, beef, hot\} \subset_\mu F)$?

Maybe then mining with Mining High-Utility Itemsets from a Database with Utility Information with the Two-Phase Algorithm by interpreting this "distance" as utility?