I am a second year Computer Science PhD student at Carnegie Mellon University
advised by Prof. Jeremy Avigad in the Hoskinson Center.
My office is 9002 Gates and Hillman Centers.
I was an attendee of the 9th and 11th Heidelberg Laureate Forums.
I am a fellow of the National Science Foundation (NSF) Graduate Research Fellowships Program.
I completed my undergraduate degrees in Mathematics and Computer Science with a minor in Music at UC Berkeley.
If what they were working on was not important and was not likely to lead to important things, then why were they working on them?
— Richard Hamming
To me, the most important problem is Automated Theorem Proving (ATP).
Proving arbitrary theorems from mathematics is definitionally the hardest problem in computer science.
Solving this one problem would solve all formal problems at once.
The following diagram from descriptive complexity theory shows complete problems for increasingly difficult complexity classes:
Recognizable / Semi-Decidable / Turing Complete —
Dependent Type Theory (General ATP)
ATP is the only formal problem with no known algorithm matching human performance.
Such an algorithm has applications in formal verification, mathematical discovery, generative design, and all manner of fields with elements of deductive reasoning or logic.
My goal is to build a human-level automated theorem prover.
My most important philosophy is first-principles thinking.
I start from the end goal and work my way back to the solution.
In ATP, this means designing a maximally simple formal language for mathematics and building in human-inspired general reasoning capabilities from the start.
This is my research project Canonical.
My approach to ATP is unique in a few key ways:
One Unified Robust Approach
Learning from Experience
Metatheoretic Reasoning
Executing Concrete Strategy
Maintaining and Creating Abstraction Barriers
One Unified Robust Approach
The strategy in automated reasoning has been to start with a base language, like SAT, and add individual language extensions over time along with dedicated reasoning to integrate it.
This has worked well for small additions, but I believe it cannot effectively scale to automating real mathematical foundations, like dependent type theory.
Instead, we must start with full expressivity and generality as a design requirement, and devise an algorithm to natively support it.
Canonical does exactly this, introducing an exceptionally elegant axiomatization of dependent type theory along with a sound and complete proof procedure.
The hope is that this language, like the CNF formula, can be a timeless representation of its complexity class.
Improvements to the proof procedure will augment capabilities in all problems in the domain, not merely a single fragment.
Learning from Experience
A mathematician that has never before seen a theorem or a proof can hardly be considered such.
However, we expect our automated reasoning systems to start with a clean slate each time they see a theorem.
The rules of the CASC ATP competition stipulate that “strategies and strategy selection based on individual problems or their solutions are not allowed.”
Without this capability, the ATP system has no choice but to make frantic decisions as though it had never proven something similar.
Furthermore, top entrants to the CASC competition contain hundreds of thousands of lines of code with hand-tuned specialized reasoning procedures.
What is needed is a way to gain capability through experience, just as a human does.
Metatheoretic Reasoning
Much of the reasoning that a mathematician does to solve a problem is not within the problem but is instead about the reasoning problem itself.
For instance, one might notice that a proof must proceed by induction because no other manipulations can make progress.
Likewise, one might show that a given algorithm constitutes a decision procedure for a fragment of the theory, allowing those proofs to be algorithmically generated.
Or, a one may prove that only a subset of the premises are required by showing that any proof using other premises can be converted into one without.
These are all examples of metatheoretic reasoning, where the proof theory itself is the object under study.
Executing Concrete Strategy
Mathematicians tame the space of theorems by going in with a plan in mind.
This is the blueprint for their decisions during the proof.
In the event of a proof failure, the strategy is consequently modified.
SAT, SMT, and FOL solvers make decisions only at the lowest level of the proof.
It is essential that a general automated reasoning system see the forest for the trees.
Maintaining and Creating Abstraction Barriers
Abstraction is the most essential concept in computer science and mathematics.
Without it, there is no hope to reason about objects with complex definitions and no ability to use prior work to support new investigations.
SAT, SMT, and FOL solvers all must break abstraction barriers to clausify and match the expected input format.
It is for this reason that a problem easily solvable by a human may become inscrutable when encoded into SAT.
While this is useful and indeed essential for problems of lower complexity classes, it immediately puts current automated reasoning systems at a disadvantage against humans at general theorem proving.
Canonical
Automated Theorem Proving for General Mathematics.
Canonical exhaustively searches for proofs of any mathematical statement in dependent type theory.
Chase Norman, Tesla Zhang, Anna Zhang, Cheng Zhang, Linxuan Ma, and Shubham Bhargava
The Only Tactic that Always Applies.
Real, Human-Readable Proofs.
Canonical writes proof terms directly as Lean code, visible to you.
Canonical adds no compile time to your project after generation.
Your collaborators and users do not need to install Canonical.
Solves the “Last Mile” of Formalization.
A proof term has a tree structure, so most of the work is near the leaves.
Canonical solves low-depth problems.
The “Last Mile” Problem contributes to the 10-20x overhead in formalizing pen and paper proofs.
Canonical can help close this gap.
A Tour de “Brute Force.”
Sound and Complete inhabitation for Dependent Type Theory.
subst
(λ x y z ↦ Eq (B
)
)
subst
p (λ x y z ↦ Eq (B
)
)
subst A
p (λ x y z ↦ Eq (B
)
)
subst A a
p (λ x y z ↦ Eq (B
)
)
subst A a b p (λ x y z ↦ Eq (B
)
)
subst A a b p (λ x y z ↦ Eq (B
)
)
(substβ
(λ x y z ↦
)
)
subst A a b p
(λ x y z ↦ Eq (B
)
(subst
(λ a b c ↦
)
)
)
(substβ
(λ x y z ↦
)
)
subst A a b p
(λ x y z ↦ Eq (B
)
(subst
(λ a b c ↦
)
) (f
))
(substβ
(λ x y z ↦
)
)
subst A a b p
(λ x y z ↦ Eq (B
)
(subst
(λ a b c ↦
)
) (f
))
(substβ A
(λ x y z ↦
)
)
subst A a b p
(λ x y z ↦ Eq (B
)
(subst
(λ a b c ↦
)
) (f
))
(substβ A a (λ x y z ↦
)
)
subst A a b p
(λ x y z ↦ Eq (B
)
(subst
(λ a b c ↦
)
) (f
))
(substβ A a (λ x y z ↦ B
)
)
subst A a b p
(λ x y z ↦ Eq (B
)
(subst
(λ a b c ↦
)
) (f
))
(substβ A a (λ x y z ↦ B
) (f
))
subst A a b p
(λ x y z ↦ Eq (B
)
(subst A
(λ a b c ↦
)
) (f
))
(substβ A a (λ x y z ↦ B
) (f
))
subst A a b p
(λ x y z ↦ Eq (B
)
(subst A
y
(λ a b c ↦
)
) (f
))
(substβ A a (λ x y z ↦ B
) (f
))
subst A a b p
(λ x y z ↦ Eq (B
) (subst A
y z (λ a b c ↦
)
)
(f
)) (substβ A a (λ x y z ↦ B
) (f
))
subst A a b p
(λ x y z ↦ Eq (B
) (subst A x y z (λ a b c ↦
)
)
(f
)) (substβ A a (λ x y z ↦ B
) (f
))
subst A a b p
(λ x y z ↦ Eq (B
) (subst A x y z (λ a b c ↦ B
)
)
(f
)) (substβ A a (λ x y z ↦ B
) (f
))
subst A a b p
(λ x y z ↦ Eq (B y) (subst A x y z (λ a b c ↦ B
)
)
(f
)) (substβ A a (λ x y z ↦ B
) (f
))
subst A a b p
(λ x y z ↦ Eq (B y)
(subst A x y z (λ a b c ↦ B
) (f
)) (f
))
(substβ A a (λ x y z ↦ B
) (f
))
subst A a b p
(λ x y z ↦ Eq (B y)
(subst A x y z (λ a b c ↦ B
) (f
)) (f y))
(substβ A a (λ x y z ↦ B
) (f
))
subst A a b p
(λ x y z ↦ Eq (B y)
(subst A x y z (λ a b c ↦ B
) (f
)) (f y))
(substβ A a (λ _ y _ ↦ B y) (f
))
subst A a b p
(λ x y z ↦ Eq (B y)
(subst A x y z (λ a b c ↦ B
) (f
)) (f y))
(substβ A a (λ _ y _ ↦ B y) (f a))
subst A a b p
(λ x y z ↦ Eq (B y)
(subst A x y z (λ _ a _ ↦ B a) (f
)) (f y))
(substβ A a (λ _ y _ ↦ B y) (f a))
subst A a b p
(λ x y z ↦ Eq (B y)
(subst A x y z (λ _ a _ ↦ B a) (f x)) (f y))
(substβ A a (λ _ y _ ↦ B y) (f a))
Goals accomplished 🎉
Canonical executes millions of tactics* every second, in parallel, and optimized with intelligent heuristics and pruning.
Canonical searches only type-correct partial terms and only canonical forms.
Canonical can fill function arguments out of order, to leverage unification.
Canonical is goal-directed, only making relevant inferences.
* where a tactic is defined as an elementary intro or apply