2026-05-12
All of my notes and learning experience on CS14003: Artificial Intelligence
This course will introduce you basic concepts in Artificial Intelligence including the following:
Do not be overwhelmed by this. After you read this blog, you will understand it easier. I can not guarantee you will understand all of those concepts after reading this blog. But I can guarantee you will not make the same mistake that I did.
Do you know Depth-First Search, Breadth-First Search and Dijkstra? You may learn it in DSA course, and yes, Global Search is about those (and a bit more). If you are not familiar with those, you can find many visualization and tutorial on the internet about DFS, BFS and Dijkstra.
Global Search algorithms are split to Uninformed search, which consists DFS, BFS and Uniform Cost Search (UCS, aka Dijkstra in disguise), and Informed search, which consists Greedy BFS (GBFS), A* algorithm. Uninformed search algorithm "is given no clue about how close a state is to the goal(s)" while Informed search is. Optional reading will covers Depth-limited search (DLS) and Iterative deepening search (IDS), which are variants of DFS, for uninformed search and Weighted A*, Memory bounded search for informed search. In global search, you should focus about the definition of a "node" (vs a "value"), "frontier", a node being "expanded", "graph search" vs "tree search", and "early-goal" vs "late-goal" test.
In A*, two importants concepts you should remember are Admissibility and Consistency:
Admissible heuristic is when the heuristic never larger than the true cost from that node to the goal. Admissibility allows the heuristic to find the optimal cost.

Figure: Admissible heuristic where estimated cost never exceeds true cost to the goal.
Consistent heuristic is when the moment we expand a node, it is the optimal path to that node. Consistency allows the heuristic to not visit a node twice. Note that Every consistent heuristic is admissible (but not vice versa).

Figure: Consistent heuristic satisfying the triangle inequality condition.
You will learn four common algorithms: Hill climbing, Simulated Annealing, Local Beam search and Evolutionary algorithms. Hill climbing has variants such as Stochastic hill climbing, First-choice hill climbing and Random-restart hill climbing in order to avoid end up at local maxima, ridges (a sequence of local maxima, better than surrounding landscape but global maxima direction is not aligned with agent's available actions) and plateaus (flat area in state-space landscape). Local Beam search iteratively search k path -> choose top k path -> repeat.

Figure: Local beam search keeps the best k=3 candidates at each iteration.
Evolutionary algorithms consists 3 stage: selection, recombination, and mutation then repeat. Selection process uses fitness function (measure how good the current individual is) and roulette method to decide (fitness function higher -> higher chance of being choose from). Recombination happens at crossover points where parents interchange their parts. Mutation is when the model will randomly perturb the offspring's bit to simulate mutation in real world. Evolutionary algorithms have many advantages such as it relies on very little domain knowledge or it haves large number of “tunable” parameters but it lacks of good empirical studies comparing to simpler methods.

Figure: Roulette-wheel selection gives higher-fitness individuals higher sampling probability.

Figure: Evolutionary algorithm loop with selection, crossover, and mutation.
Minimax relate to game theory where two players try to pick actions that maximize/minimize other's score. The algorithm is simple: at MAX nodes, you choose the maximum value of all children and do the opposite for MIN nodes. Alpha-beta pruning's intuition lies in the fact that we can skip some nodes or subtree and still get the same result as minimax. Alpha-beta pruning introduce and which are minimum and maximum value we can get at the current node, respectively. You will learn more in the course but you can simply remember this: "MAX node updated left/alpha ( get updated after traversing one of its subtrees) and MIN node updated right/beta ( get updated after traversing one of its subtrees). We prune all remaining subtrees of this node when and range is invalid, which is (remember the inclusion of = sign)".

Figure: Alpha-beta pruning skips branches that cannot affect the final minimax decision.
These are problems where you have variables that need to assign some value in its domain (allowed value). You will learn "unary/binary constraint" (depend on the number of variables in the constraint), "forward checking" to remove domain value that conflict with existing constraints and "AC-3" (forward checking but can propagate). To solve CSP, we can formulate it as a search problem and use backtracking with some heuristic such as Minimum Remaining Value (MRV), Degree Heuristic (DH) and Least Constraining Value (LCV) to choose which variables and which values to fill.
This is a hard section, so I will try to summarize it. But the core of this section is check whether KB entails or in other words, check whether we can prove that a new logic is wrong (or conflict) compared to current set of logic! Important concept: concept of "satisfiable", resolution refutation, "substitute", "skolemize" and forward/backward chaining algorithms.
First thing we should know is a property of Knowledge-based (KB) agents: "Knowledge-based agents perform reasoning over an internal representation of knowledge to decide what actions to take". KB agents involves a Knowledge base (KB), a set of sentences or facts, and Inference, which is a process where agents derive new sentences from old ones. A simple inference procedure tries to check whether KB entail () a sentence or not. But what is entailment? "Entailment is a relationship between sentences (i.e., syntax) that is based on semantics" and entails if and only if, in every model which is true, is also true.
The syntax in PL can be summarized as this:
We also have concept of "validity" and "satisfiability".
To answer the question "Does KB semantically entail ?", we can use Model-checking approach, Inference rules or Resolution refutation (inverse SAT problem, where we will prove that opposite of conflicts the facts in KB and thus KB should entail ).

Figure: Modus ponens and related inference rules used in propositional reasoning.
When KB only contains definite clauses, we can use forward chaining and backward chaining to do the inference:

Figure: Forward chaining expands known facts until the target query Q is derived.
In propositional logic, knowledge and inference are separate, with inference being domain-independent. The syntax and semantics of first-order logic is built around objects and relations. FOL concepts involves symbols: constant symbol represents object, predicate symbol represents relation and function symbol represents functions.
FOL syntax is more diverse than PL with quantifier and term:
Quantifiers:
There is sign but not , instead we use .
To derive new facts, we can use Universal / Existential Instantiation rule.
For example: We can infer from any of the below:
Some knowledge fragment of inference procedure you should learn:
The inference procedure is the same as PL but we have 2 new things:
Remember how to form the "atmost one", "atleast two" and "atleast one" sentence. Also, REMEMBER TO SKOLEMIZE!
| Machine learning provides machines the ability to learn from past experiences to identify patterns and make predictions Theory covers the following concepts

Figure: Unsupervised learning examples including dimensionality reduction, clustering and association analysis

Figure: Comparison between supervised and unsupervised learning using example questions for temperature.
Then the real part is ID3 Decision Tree: a supervised learning method for classification. The core concept is Divide and conquer (Split data into smaller and smaller subsets using a single variable). We would want to split based on purity so that purity of each split increases. This means that the more we split, the better we can use certain traits to classify samples into the corresponding classes. The metric for this purity is Information Gain, which based on Entropy. The formula for entropy is (random variable V with values having probability ) and the formula for information gain is where is entropy of the whole dataset.

Figure: Full ID3 tree of an example dataset

Figure: Example calculation of entropy for each value in variable and the entropy for the variable

Figure: 4 edge cases illustrating ambiguous split choices.
Strategy to do this part quickly:
Some default value is:
A perceptron has a single neuron with adjustable synaptic weights and a hard limiter (with a defined threshold). Perceptron learning rule is simple: init -> activation -> weight update -> iterate. A fully connected feedforward network with at least three layers allows mapping inputs to specified target value.

Figure: Multi-layer perceptron architecture with input, hidden, and output layers.
The weight update for output layer and hidden layers are different:
The rule to remember is simple: to calculate the weight gradients at any layer , we calculate the backpropagated error signal that reaches that layer from the "afterward" layers, and weight it by the feed-forward signal at feeding into that layer.

Figure: Backpropagation signal flow used to compute gradients and update weights.
And importantly, DO NOT FORGET ABOUT BIAS UPDATE RULE: => IT MEANS NO WEIGHTING FOR THE ERROR TERM
Maybe.