Action applynrnoisy

Caution

Needs clusters from action learnnrnoisy

Application of Non-redundant Noisy-OR (NRNO) to the test set (requires learned clusters). NRNO tries to overcome the problem of redundancy when using the Noisy-OR aggregation method by clustering rules based on their redundancy degree prior to aggregation. Predictions of rules in a cluster are aggregated using the Maximum approach, as this approach is not susceptible to redundancies. Predictions of the different clusters are then further aggregated using the Noisy-OR approach. As a metric for redundancy between two rules \(r_i\), \(r_j\) the Jaccard Index \(sim(r_i,r_j) = |\hat{H}_{r_i} \cap \hat{H}_{r_j}| / |\hat{H}_{r_i} \cup \hat{H}_{r_j}|\) of the sets of inferred triples is used. As the calculation of the Jaccard coefficient is very inefficient for large sets, the Jaccard coefficient is estimated using the MinHash scheme, which makes time complexity linear and memory usage constant.

Configuration file

Input :
  • PATH_TRAININGValid path (file)

    Path to training file (absolute or relative), default: train.txt

  • PATH_TESTValid path (file)

    Path to test file (absolute or relative), default: test.txt

  • PATH_VALIDValid path (file)

    Path to validation file (absolute or relative), default: valid.txt

  • PATH_RULESValid path (file)

    Path to rules file (absolute or relative), default: rules.txt

  • PATH_CLUSTERValid path (file)

    Path to clustering file, default: cluster.txt

Properties :
  • WORKER_THREADSint

    Number of threads that are used for computation. (-1 means all threads are used), default: -1

  • DISCRIMINATION_BOUNDint

    Discriminates (omits) rules which predict more elements than this, 0 means no limit., default: 4000

  • UNSEEN_NEGATIVE_EXAMPLESint

    The number of negative examples for which we assume that they exist, however, we have not seen them. Rules with high coverage are favoured the higher the chosen number, default: 5

  • REFLEXIV_TOKENstring

    Token used for substitution of reflexive rules. (Used if AnyBURL ruleset was trained with REWRITE_REFLEXIV = TRUE), default: me_myself_i

  • TOP_K_OUTPUTint

    The top-k results that are after filtering kept in the results, default: 10

  • PREDICT_UNKNOWNint

    If set to 1, does not skip triples containing unkwown entities in the training set. F.e. generates predictions for john speaks UNKOWN if UNKNOWN is not in the training set. default 0

  • ONLY_XYint

    If set to 1, only cyclic (XY) rules are read from the rules file, default: 0

Output :
  • PATH_OUTPUTValid path (file)

    Path to file used for storing predictions, default: predictions.txt