Action learnnrnoisy

Caution

Needs similarity matrices from action calcjacc

Learning of the optimal thresholds for the clustering used by non redundant noisy or (requires similarity matrices)

Configuration file

Input :
  • PATH_TRAININGValid path (file)

    Path to training file (absolute or relative), default: train.txt

  • PATH_TESTValid path (file)

    Path to test file (absolute or relative), default: test.txt

  • PATH_VALIDValid path (file)

    Path to validation file (absolute or relative), default: valid.txt

  • PATH_RULESValid path (file)

    Path to rules file (absolute or relative), default: rules.txt

  • PATH_JACCARDValid path (directory)

    Path to directory containing jaccard files, default: jaccard/

Properties :
  • WORKER_THREADSint

    Number of threads that are used for computation. (-1 means all threads are used), default: -1

  • DISCRIMINATION_BOUNDint

    Discriminates (omits) rules which predict more elements than this, 0 means no limit., default: 4000

  • UNSEEN_NEGATIVE_EXAMPLESint

    The number of negative examples for which we assume that they exist, however, we have not seen them. Rules with high coverage are favoured the higher the chosen number, default: 5

  • REFLEXIV_TOKENstring

    Token used for substitution of reflexive rules. (Used if AnyBURL ruleset was trained with REWRITE_REFLEXIV = TRUE), default: me_myself_i

  • BUFFER_SIZEint

    Buffer size (in amount of integers, 4 byte) used to limit memory consumption of buffering previously inferred rules. Should only be set if running out of memory. (2500000000 –> ~10 GB), default: Maximum unsigned long long

  • TOP_K_OUTPUTint

    Top-K predictions that are used to calculate the MRR for hyperparameter search, default: 10

  • RESOLUTIONint

    Sets the accuracy of the Jaccard estimation. The number of hash functions used in MinHash (f.e. RESOLUTION = 200 –> 200 hash functions –> Max resolution of Jaccard 1/200), default: 200

  • STRATEGY[grid|random]

    Sets the search strategy to be used for finding optimal clustering, default: grid

  • ITERATIONSint

    Amount of iterations used in random search strategy, default: 10000

  • SEEDint

    Seed for the sampling of thresholds used in random search strategy, default: 0

  • VERBOSEint

    If set to 1, writes the MRR and settings (thresholds) of each iteration of the hyperparameter search to seperate files named {relation}_chk.txt, default: 0

  • ONLY_XYint

    If set to 1, only cyclic (XY) rules are read from the rules file, default: 0

Output :
  • PATH_CLUSTERValid path (file)

    Path to clustering file, default: cluster.txt