11 Inductive Logic Programming (ILP)

11.1 Introduction to ILP

Inductive Logic Programming (ILP) merges logic programming (specifically first-order logic for formalism) with inductive learning (learning concept descriptions from examples). It performs inference from specific observations to derive general hypotheses.

11.1.1 Core Idea

The learning process in ILP can be characterized as:

Given:

Input Knowledge (I)
Goal (G)
Background Knowledge (BK)
Transmutations (set(T)) - Operators that change knowledge

Determine:

Output knowledge (O), that satisfies G, by applying transmutations from T to I and BK.

This process is viewed as a goal-driven search in a knowledge space defined by the representation language and search operators.

11.1.2 Advantages of ILP

Unified Formalism: Uses the same logic-based formalism for data, hypotheses, and background knowledge.
Expressive Power: High capability to represent complex concepts.
Human-Readability: Resulting hypotheses are often understandable.
Background Knowledge Integration: Allows explicit inclusion of domain knowledge.
Powerful Reasoning: Leverages established mechanisms from logic theory.

11.2 Foundations of Learning in ILP

11.2.1 Types of Inference

A general formula for inference is: P \cup B \vdash C

P: Premise
B: Background knowledge
C: Consequence

11.2.2 Induction in Logics

Given a set of sentences representing beliefs, with a subset as data (\Delta) and the rest as background theory (\Gamma): If \Gamma \not\vdash \Delta (the theory doesn’t explain the data), then \Phi is an inductive conclusion (\Gamma \cup \Delta \vdash \Phi) iff:

\Gamma \cup \Delta \not\vdash \neg\Phi: The hypothesis \Phi is consistent with \Gamma and \Delta.
\Gamma \cup \Phi \vdash \Delta: The hypothesis \Phi explains the data \Delta.

In logical terms, given consistent observations (O) and background knowledge (B), find a hypothesis (H) such that B \land H \vdash O: the background knowledge and the hypothesis logically prove the observations.

11.2.3 The ILP Learning Task

Given:

A set of observations O:
- Positive examples E+
- Negative examples E-
Background knowledge B
Language of hypotheses LH
Covering relationship (defines how H explains examples)

Find:

A Hypothesis H \in LH such that, given B,
- H covers all positive examples (Completeness)
- H covers no negative examples (Consistency)

In ILP, examples are typically ground facts, and B and H are sets of definite clauses.

11.3 ILP as a Search Problem

The search space in ILP is determined by the language of logic programs (L), where states are concept descriptions (hypotheses). The goal is to find one or more hypotheses satisfying the completeness and consistency criteria.

11.3.1 Structuring the Search Space: Generality

The hypothesis space is structured by a generality relationship: G is more general than S iff covers(S) \subseteq covers(G).

Pruning the search space:

Positive examples: If H covers an example E, then all generalizations of H also cover E.
Negative examples: If H does not cover an example E, then no specializations of H will cover E.

11.3.2 Ordering Relations for Clauses

Understanding ordering is key to navigating the search space. Given a set of clauses S, and C, D, E \in S:

Quasi-ordering (\leq): Reflexive (C \leq C) and transitive (C \leq D \land D \leq E \Rightarrow C \leq E).
Partial ordering: A quasi-ordering that is also anti-symmetric (C \leq D \land D \leq C \Rightarrow C = D).
Total ordering: For any pair C, D, either C = D, C \leq D, or C \geq D.

11.4 Generalization Relationships in ILP

Clauses can be ordered using generalization relationships: implication (\Rightarrow), \theta-subsumption (\leq_{\theta}), and \theta_{OI}-subsumption (Object Identity).

11.4.1 Implication (\Rightarrow)

Definition: C is more general than D (C \Rightarrow D) if whenever C is true, D is also true.
Properties:
- Undecidable: No general procedure exists to decide if D \Rightarrow C.
- Quasi-ordering: Not a partial ordering (anti-symmetry doesn’t hold; non-renaming clauses can be equivalent).
- Least General Generalization (lgg):
  - A clause C is a generalization under implication of S = {D_1, …, D_n} iff \forall i: C \Rightarrow D_i.
  - C is an lgg of S if for any other generalization C’ of S, C’ \Rightarrow C.
  - The lgg of a finite set of clauses is not computable (implication is undecidable) and not unique under implication (because of the quasi-ordering).

11.4.2 \theta-Subsumption (\leq_{\theta})

Definition: C \theta-subsumes D (C \leq_{\theta} D) iff there exists a substitution \theta such that C\theta \subseteq D (all literals in C\theta are in D).
Properties:
- Decidable: A procedure exists, but the test is generally NP-complete.
- Quasi-ordering: Not a partial ordering (not anti-symmetric).
- Weaker than implication: Lacks the ability to handle recursion in the same way.
- Lattice Structure: The set of clauses (C, \leq_{\theta}) forms a lattice with:
  - \land_{C} (glb): Most specific generalization under \theta-subsumption.
  - \lor_{C} (lub): Most general specialization under \theta-subsumption.
  - The lattice structure admits infinite chains of syntactically different but equivalent clauses, potentially causing non-termination in search.
- Unique lgg: The lgg under \theta-subsumption is unique.

11.4.2.1 Calculating lgg under \theta-Subsumption

Terms t_1, t_2:
- lgg(t, t) = t
- lgg(f(s_1, …, s_n), f(t_1, …, t_n)) = f(lgg(s_1, t_1), …, lgg(s_n, t_n))
- lgg(f(s_1, …, s_n), g(t_1, …, t_m)) = V (where f \neq g, V is a new variable)
- lgg(s, t) = V (where s \neq t, and at least one is a variable, V is a new variable)
Atoms a_1, a_2 (same predicate p):
- lgg(p(s_1, …, s_n), p(t_1, …, t_n)) = p(lgg(s_1, t_1), …, lgg(s_n, t_n))
- lgg(p(…), q(…)) is undefined if p \neq q.
Literals l_1, l_2:
- Positive: lgg(a_1, a_2) as above.
- Negative (\nega_1, \nega_2): lgg(\nega_1, \nega_2) = \neglgg(a_1, a_2).
- Opposite signs: Undefined.
Clauses C_1 = {h_1, …, h_n}, C_2 = {k_1, …, k_m}:
- lgg(C_1, C_2) = {l_{ij} = lgg(h_i, k_j) | h_i \in C_1, k_j \in C_2 \land lgg(h_i, k_j) is defined}

A key issue with \theta-subsumption is that distinct variables can be substituted by the same term.

11.4.3 \theta_{OI}-Subsumption (Object Identity)

This addresses some limitations of \theta-subsumption by assuming that terms denoted by different symbols in a clause must be distinct.

Example of OI Assumption

Clause C = p(X) :- q(X, Y), q(Y, a). is an abbreviation for C_{OI} = p(X) :- q(X, Y), q(Y, a) || [X != Y], [X != a], [Y != a].

Definition: C \theta_{OI}-subsumes D iff \exists a substitution \alpha such that C_{OI}\alpha \subseteq D_{OI}.
Properties:
- Weaker than \theta-subsumption.
- Anti-symmetric: Only renamings can be equivalent (unlike \theta-subsumption or implication).
- Quasi-ordered set: Reflexive and transitive.
- Not a lattice: Unlike \theta-subsumption, different LGGs can be incomparable.
- Redundancy test complexity: O(n^2).
- Each element in the space is an equivalence class.

11.6 Example System: InTheLEx (Incremental Theory Learner from Examples)

InTheLEx is an ILP system with the following characteristics:

Fully incremental: Learns from examples one by one.
Multi-conceptual: Can learn multiple concepts.
Closed-loop feedback: Incorporates learned knowledge.
Hierarchical theories: Can learn structured knowledge.
Handles Positive and Negative examples.
Total storage: Retains all information.
Multi-strategy: Employs various reasoning techniques.
Based on Object Identity.

11 Inductive Logic Programming (ILP)

11.1 Introduction to ILP

11.1.1 Core Idea

11.1.2 Advantages of ILP

11.2 Foundations of Learning in ILP

11.2.1 Types of Inference

11.2.2 Induction in Logics

11.2.3 The ILP Learning Task

11.3 ILP as a Search Problem

11.3.1 Structuring the Search Space: Generality

11.3.2 Ordering Relations for Clauses

11.4 Generalization Relationships in ILP

11.4.1 Implication (\Rightarrow)

11.4.2 \theta-Subsumption (\leq_{\theta})

11.4.2.1 Calculating lgg under \theta-Subsumption

11.4.3 \theta_{OI}-Subsumption (Object Identity)

11.5 Refinement Operators

11.5.1 Properties of Ideal Refinement Operators

11.5.2 Existence of Ideal Operators

11.6 Example System: InTheLEx (Incremental Theory Learner from Examples)