Computer science

The dominating model of a computer.

The model is extremely simple, but has been proven to be able to solve all the problems that any reasonable computer model can solve, thus its adoption as the "default model".

The smallest known Turing machine that cannot be proven to halt or not as of 2019 is 7,918-states: www.scottaaronson.com/blog/?p=2725. Shtetl-Optimized by Scott Aaronson is just the best website.

A bunch of non-reasonable-looking computers have also been proven to be Turing complete for fun, e.g. Magic: The Gathering.

Universal Turing machine

A Turing machine that simulates another Turing machine/input pair that has been encoded as a string.

In other words: an emulator!

The concept is fundamental to state several key results in computer science, notably the halting problem.

Turing complete

A computer model that is as powerful as the most powerful computer model we have: Turing machine!

Chomsky hierarchy

This is the classic result of formal language theory, but there is too much slack between context free and context sensitive, which is PSPACE (larger than NP!).

By Noam Chomsky.

A good summary table that opens up each category much more can be seen e.g. at the bottom of en.wikipedia.org/wiki/Automata_theory under the summary thingy at the bottom entitled "Automata theory: formal languages and formal grammars".

Recursively enumerable language

Recursively enumerable language

There is a Turing machine that halts for every member of the language with the answer yes, but does not necessarily halt for non-members.

Non-examples: cs.stackexchange.com/questions/52503/non-recursively-enumerable-languages

RE (complexity)

Recursive language

Subset of recursively enumerable language as explained at: difference between recursive language and recursively enumerable language.

R (complexity)

Set of all decision problems solvable by a Turing machine, i.e. that decide if a string belongs to a recursive language.

Undecidable problem

Is a decision problem of determining if something belongs to a non-recursive language.

Or in other words: there is no Turing machine that always halts for every input with the yes/no output.

Every undecidable problem must obviously have an infinite number of "possibilities of stuff you can try": if there is only a finite number, then you can brute-force it.

Some undecidable problems are of recursively enumerable language, e.g. the halting problem.

Lists of undecidable problems.

Coolest ones besides the obvious boring halting problem:

mortal matrix problem
Diophantine equation existence of solutions: undecidable Diophantine equation problems

Undecidability requires infinitely many inputs

If there are infinitely many inputs, we can always construct a (potentially exponentially huge) Turing machine that hardcodes the outcome for every possible input, so the problem is never undecidable.

The problem is of course deciding and proving the outcome for each possible input, notably as it is possible that calculation for some of the inputs may be independent from ZFC.

Mortal matrix problem

en.wikipedia.org/wiki/Zero_matrix#Occurrences

One of the most simple to state undecidable problems.

The reason that it is undecidable is that you can repeat each matrix any number of times, so there isn't a finite number of possibilities to check.

Computable problem

decidable problem is to a decision problem
like a computable problem is to a function problem

Computable function

Uncomputable function

The prototypical example is the Busy beaver function, which is the easiest example to reach from the halting problem.

Busy beaver function

Computable number

math.stackexchange.com/questions/462790/are-there-any-examples-of-non-computable-real-numbers

There are only boring examples of taking an uncomputable language and converting it into a number?

Difference between recursive language and recursively enumerable language

stackoverflow.com/questions/33467040/what-is-the-difference-between-recursive-and-recursively-enumerable-languages/65455863#65455863

Recursive set

Same as recursive language but in the context of the integers.

Context-free language

Regular language

Regular expression

Computational problem

The list: complexityzoo.uwaterloo.ca/Complexity_Zoo

Decision problem

Computational problem where the solution is either yes or no.

When there are more than two possible answers, it is called a function problem.

Decision problems come up often in computer science because many important problems are often stated in terms of "decide if a given string belongs to given formal language".

Undecidable problem

Halting problem

The canonical undecidable problem.

Turing machine decider

A Turing machine decider is a program that decides if one or more Turing machines halts of not.

Of course, because what we know about the halting problem, there cannot exist a single decider that decides all Turing machines.

E.g. The Busy Beaver Challenge has a set of deciders clearly published, which decide a large part of BB(5). Their proposed deciders are listed at: discuss.bbchallenge.org/c/deciders/5 and actually applied ones at: bbchallenge.org.

But there are deciders that can decide large classes of turing machines.

Many (all/most?) deciders are based on simulation of machines with arbitrary cutoff hyperparameters, e.g. the cutoff space/time of a Turing machine cycler decider.

The simplest and most obvious example is the Turing machine cycler decider

Turing machine regex tape notation

Turing machine regex tape notation is Ciro Santilli's made up name for the notation used e.g. at:

Most of it is just regular regular expression notation, with a few differences:

$0^{i n f}$ denotes the right or left edge of the (zero initialized) tape. It is often omitted as we always just assume it is always present on both sides of every regex
A, B, C, D and E denotes the current machine state. This is especially common notation in the context of the BB(5) problem
< and > next to the state indicate if the head is on top of the left or right element. E.g.:
```
11 (01)^n <A 00 (0011)^{n+2}
```
indicates that the head A is on top of the last 1 of the last sequence of n 01s to the left of the head.

This notation is very useful, as it helps compress long repeated sequences of Turing machine tape and extract higher level patterns from them, which is how you go about understanding a Turing machine in order to apply Turing machine acceleration.

Cycler Turing machine

Bibliography: discuss.bbchallenge.org/t/decider-cyclers/33

Example: bbchallenge.org/279081.

These are very simple, they just check for exact state repetitions, which obviously imply that they will run forever.

Unfortunately, cyclers may need to run through an initial setup phase before reaching the initial cycle point, which is not very elegant.

Also, we have no way of knowing the initial setup length of the actual cycle length, so we just need an arbitrary cutoff value.

And unfortunately, this can lead to misses, e.g. Skelet machine #1, a 5 state machine, has a (translated) cycle that starts at around 50-200M steps, and takes 8 trillion steps to repeat.

Translated cycler Turing machine

Bibliography: discuss.bbchallenge.org/t/decider-translated-cyclers/34

Like a cycler, but the cycle starts at an offset.

To see infinity, we check that if the machine only goes left N squares until reaching the repetition, then repetition must only be N squares long.

Skelet machine #1

Closed Tape Language decider

Described at: www.sligocki.com/2022/06/10/ctl.html

Busy beaver

The busy beaver game consists in finding, for a given

n

, the turing machine with

n

states that writes the largest possible number of 1's on a tape initially filled with 0's. In other words, computing the busy beaver function for a given

n

There are only finitely many Turing machines with

n

states, so we are certain that there exists such a maximum. Computing the Busy beaver function for a given

n

then comes down to solving the halting problem for every single machine with

n

states.

Some variant definitions define it as the number of time steps taken by the machine instead. Wikipedia talks about their relationship, but no patience right now.

The Busy Beaver problem is cool because it puts the halting problem in a more precise numerical light, e.g.:

the Busy beaver function is the most obvious uncomputable function one can come up with starting from the halting problem
the Busy beaver scale allows us to gauge the difficulty of proving certain (yet unproven!) mathematical conjectures

Bibliography:

Step busy beaver

The step busy beaver is a variant of the busy beaver game counts the number of steps before halt, instead of the number of 1's written to the tape.

As of 2023, it appears that BB(5) the same machine, , will win both for 5 states. But this is not always necessarily the case.

Busy beaver function ( $BB (n)$ )

BB (n)

is the largest number of 1's written by a halting

n

-state Turing machine on a tape initially filled with 0's.

Video 1.

The Boundary of Computation by Mutual Information (2023)

. Source.

Specific values of the Busy beaver function

The following things come to mind when you look into research in this area, especially the search for BB(5) which was hard but doable:

it is largely recreational mathematics, i.e. done by non-professionals, a bit like the aperiodic tiling. Humbly, they tend to call their results lemmas
complex structure emerges from simple rules, leading to a complex classification with a few edge cases, much like the classification of finite simple groups

Bibliography:

Turing machine acceleration

Turing machine acceleration refers to using high level understanding of specific properties of specific Turing machines to be able to simulate them much fatser than naively running the simulation as usual.

Acceleration allows one to use simulation to find infinite loops that might be very long, and would not be otherwise spotted without acceleration.

This is for example the case of www.sligocki.com/2023/03/13/skelet-1-infinite.html proof of Skelet machine #1.

Busy Beaver Challenge

bbchallenge.org/story

Project trying to compute BB(5) once and for all. Notably it has better presentation and organization than any other previous effort, and appears to have grouped everyone who cares about the topic as of the early 2020s.

Very cool initiative!

By 2023, they had basically decided every machine: discuss.bbchallenge.org/t/the-30-to-34-ctl-holdouts-from-bb-5/141

In June 2024 they felt that they had verified the result after a full Coq proof was published:

So now onto BB(6) I guess.

BB(5) (Busy beaver function of 5)

The last value we will likely every know for the busy beaver function! BB(6) is likely completely out of reach forever.

By 2023, it had basically been decided by the The Busy Beaver Challenge as mentioned at: discuss.bbchallenge.org/t/the-30-to-34-ctl-holdouts-from-bb-5/141, pending only further verification. It is going to be one of those highly computational proofs that will be needed to be formally verified for people to finally settle.

As that project beautifully puts it, as of 2023 prior to full resolution, this can be considered the:

simplest open problem in mathematics

on the Busy beaver scale.

Marxen-Buntrock machine (1989, 4098 1's, ~47M steps)

Best busy beaver machine known since 1989 as of 2023, before a full proof of all 5 state machines had been carried out.

Entry on The Busy Beaver Challenge: bbchallenge.org/1RB1LC_1RC1RB_1RD0LE_1LA1LD_1RZ0LA

Paper extracted to HTML by Heiner Marxen: turbotm.de/~heiner/BB/mabu90.html

Skelet’s machines (2003)

List on The Busy Beaver Challenge: bbchallenge.org/skelet

Bibliography:

Skelet machine #1 (proved 2023, cycle start: 50-200M, period: ~8B)

On The Busy Beaver Challenge: bbchallenge.org/68329601

Skelet machine #1 is infinite

Non formal proof with a program March 2023: www.sligocki.com/2023/03/13/skelet-1-infinite.html Awesome article that describes the proof procedure.

Formal proof August 2023: discuss.bbchallenge.org/t/skelet-1-is-a-translated-cycler-coq-agrees/166

The proof uses Turing machine acceleration to show that Skelet machine #1 is a Translated cycler Turing machine with humongous cycle paramters:

start between 50-200 M steps, not calculated precisely on the original post
period: ~8 billion steps

BB(6) (Busy beaver function of 6)

BB(6) is hard

A hard problem ha been found for it, and it was called the "antihydra":

news.ycombinator.com/item?id=40864949 BB(6), The 6th Busy Beaver Number, is harder than a Collatz-like math problem
www.reddit.com/r/math/comments/1dubva0/finding_the_6th_busy_beaver_number_%CF%836_aka_bb6_is/ "Finding the 6th busy beaver number (Σ(6), AKA BB(6)) is at least as hard as a hard Collatz-like math problem called Antihydra":
www.reddit.com/r/compsci/comments/1duc62e/finding_the_6th_busy_beaver_number_%CF%836_aka_bb6_is/

Antihydra (28 Jun 2024)

The Antihydra is the first hard-looking problem for BB(6), what some would classify as a Collatz-like problem.

It is documented on the Busy Beaver Challenge wiki at: wiki.bbchallenge.org/wiki/Antihydra

Antihydra in Magic: The Gathering

Some dude recreated the antihydra on Magic: The Gathering at: aesort.com/antihydra, probably: x.com/IsaacKing314/status/1870637729375219740.

It is known that Magic: The Gathering is Turing complete, but it is cool to have a concrete specific example of an open problem in mathematics coded in it.

Figure 1.
Screenshot of the Antihydra in Magic: The Gathering construction
.

Antihydra GMP implementation

gmp/antihydra.c

Also posted at:

wiki.bbchallenge.org/w/index.php?title=Antihydra&oldid=958 But obviously it got deleted, not even a tiny shitpage maintained by 5 people is immune to deletionism
cstheory.stackexchange.com/questions/20978/what-is-the-smallest-turing-machine-where-it-is-unknown-if-it-halts-or-not/53326#53326
cs.stackexchange.com/questions/59344/what-are-very-short-programs-with-unknown-halting-status/162108#162108

Busy beaver scale

The Busy beaver scale allows us to gauge the difficulty of proving certain (yet unproven!) mathematical conjectures!

To to this, people have reduced certain mathematical problems to deciding the halting problem of a specific Turing machine.

A good example is perhaps the Goldbach's conjecture. We just make a Turing machine that successively checks for each even number of it is a sum of two primes by naively looping down and trying every possible pair. Let the machine halt if the check fails. So this machine halts iff the Goldbach's conjecture is false! See also Conjecture reduction to a halting problem.

Therefore, if we were able to compute

BB (n)

, we would be able to prove those conjectures automatically, by letting the machine run up to

BB (n)

, and if it hadn't halted by then, we would know that it would never halt.

Of course, in practice,

BB

is generally uncomputable, so we will never know it. And furthermore, even if it were computable, it would take a lot longer than the age of the universe to compute any of it, so it would be useless.

However, philosophically speaking at least, the number of states of the equivalent Turing machine gives us a philosophical idea of the complexity of the problem.

The busy beaver scale is likely mostly useless, since we are able to prove that many non-trivial Turing machines do halt, often by reducing problems to simpler known cases. But still, it is cute.

But maybe, just maybe, reduction to Turing machine form could be useful. E.g. The Busy Beaver Challenge and other attempts to solve BB(5) have come up with large number of automated (usually parametrized up to a certain threshold) Turing machine decider programs that automatically determine if certain (often large numbers of) Turing machines run forever.

So it it not impossible that after some reduction to a standard Turing machine form, some conjecture just gets automatically brute-forced by one of the deciders, this is a path to

Turing machine compiler

cs.stackexchange.com/questions/50815/compiler-that-compiles-to-a-turing-machine/161872#161872

Automated theorem proving by halting problem reduction

If you can reduce a mathematical problem to the Halting problem of a specific turing machine, as in the case of a few machines of the Busy beaver scale, then using Turing machine deciders could serve as a method of automated theorem proving.

That feels like it could be an elegant proof method, as you reduce your problem to one of the most well studied representations that exists: a Turing machine.

However it also appears that certain problems cannot be reduced to a halting problem... OMG life sucks (or is awesome?): Section "Turing machine that halts if and only if Collatz conjecture is false".

Conjecture reduction to a halting problem

bbchallenge.org/story#what-is-known-about-bb lists some (all?) cool examples,

BB(15): Erdős' conjecture on powers of 2, which has some relation to Collatz conjecture
BB(27): Goldbach's conjecture
BB(744): Riemann hypothesis
BB(748): independent from the Zermelo-Fraenkel axioms
BB(7910): independent from the ZFC

wiki.bbchallenge.org/wiki/Cryptids contains a larger list. In June 2024 it was discovered that BB(6) is hard.

Turing machine that halts if and only if the Goldbach conjecture is false (27-state)

www.scottaaronson.com/papers/bb.pdf

Turing machine that halts if and only if Collatz conjecture is false

mathoverflow.net/questions/309044/is-there-a-known-turing-machine-which-halts-if-and-only-if-the-collatz-conjectur suggests one does not exist. Amazing.

Intuitively we see that the situation is fundamentally different from the Turing machine that halts if and only if the Goldbach conjecture is false because for Collatz the counter example must go off into infinity, while in Goldbach conjecture we can finitely check any failures.

Amazing.

Function problem

A problem that has more than two possible yes/no outputs.

It is therefore a generalization of a decision problem.

Busy beaver

Inverse problem

Integer algorithm

We define an "integer algorithm" as an algorithm that takes integer inputs and produces integer outputs.

Integer multiplication

cs.stackexchange.com/questions/16226/what-is-the-fastest-algorithm-for-multiplication-of-two-n-digit-numbers

Integer factorization

Complexity: NP-intermediate as of 2020:

expected not to be NP-complete because it would imply NP != Co-NP: cstheory.stackexchange.com/questions/167/what-are-the-consequences-of-factoring-being-np-complete#comment104849_169
expected not to be in P because "could we be that dumb that we haven't found a solution after having tried for that long?

The basis of RSA: RSA. But not proved NP-complete, which leads to:

Integer factorization algorithm

Shor's algorithm

NP-hard cryptosystem

This is natural question because both integer factorization and discrete logarithm are the basis for the most popular public-key cryptography systems as of 2020 (RSA and Diffie-Hellman key exchange respectively), and both are NP-intermediate. Why not use something more provenly hard?

cs.stackexchange.com/questions/356/why-hasnt-there-been-an-encryption-algorithm-that-is-based-on-the-known-np-hard "Why hasn't there been an encryption algorithm that is based on the known NP-Hard problems?"

Discrete logarithm

Logarithm of a discrete groups.

NP-intermediate as of 2020 for similar reasons as integer factorization.

An important case is the discrete logarithm of the cyclic group in which the group is a cyclic group.

Elliptic curve cryptography

Discrete logarithm of the cyclic group

This is the discrete logarithm problem where the group is a cyclic group.

In this case, the problem becomes equivalent to reversing modular exponentiation.

This computational problem forms the basis for Diffie-Hellman key exchange, because modular exponentiation can be efficiently computed, but no known way exists to efficiently compute the reverse function.

Functional problem with array as input

Largest element in an array

www.geeksforgeeks.org/program-to-find-largest-element-in-an-array/

K-th largest element in an array

Simple interview problem!

Longest common subsequence

Note that the subsequences do not need to be contiguous.

Implementations:

cpp/longest_common_subsequence.cpp

On coding challenge websites:

Subset sum problem

Sample implementation:

cpp/subset_sum.cpp

On coding challenge websites:

3SUM

It is cool how even for such a "simple looking" problem, we were still unable to prove optimality as of 2020!

Two sum problem

Algorithm

A solution to a computational problem!

Algorithm cheatsheet

Draft by Ciro Santilli with cross language input/output test cases: github.com/cirosantilli/algorithm-cheat

By others:

github.com/TheAlgorithms/Python

Data structure

Associative array (map, dictionary)

More commonly known as a map or dictionary.

Binary search tree (BST)

B-tree

Like Binary search tree, but each node can have multiple objects and more than two children.

Hash table (Hash map)

Dynamic array

Linked list

Trie

Sample implementations:

C++: cpp/trie.cpp

Recursion (computer science)

Iteration

Iterative algorithm

Sorting algorithm

String-sorting algorithm

Natural sort order

String-search algorithm

Class of algorithm

Greedy algorithm

Dynamic programming

Complexity class

AGI-complete

Time complexity

Quasilinear time ( $O (n lo g^{k} (n))$ )

Big O notation family

This is a family of notations related to the big O notation. A good mnemonic summary of all notations would be:

big O notation: $∣ f ∣ \leq g$
little-o notation: $∣ f ∣ < g$

Big O notation ( $O (n)$ )

Module bound above, possibly multiplied by a constant:

f (x) = O (g (x))

(1)

is defined as:

\exists M > 0\exists x_{0} \forall x > x_{0} : ∣ f (x) ∣ \leq M g (x)

(2)

E.g.:

$\forall c \in R x + c = O (x)$ . For $c < 0$ , $M = 1$ is enough. Otherwise, any $M > 1$ will do, the bottom line will always catch up to the top one eventually.

Little-o notation ( $o (n)$ )

Stronger version of the big O notation, basically means that ratio goes to zero. In big O notation, the ratio does not need to go to zero.

So in informal terms, big O notation means

\leq

, and little-o notation means

<

E.g.:

$x = O (x)$
$x \neq = o (x)$ K does not tend to zero
$x = O (x^{2})$
$x = o (x^{2})$

Primitive recursive function