saqut-compiler/readme.md

186 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# saQut
> **A compiler built as a toolbox, not a black box —**
> every internal phase is a first-class, inspectable output.
```
saqut tokens file:fib.sqt → token stream, JSON
saqut ast file:fib.sqt → full AST, JSON
saqut ast file:fib.sqt --optimized → constant-folded + DCE'd AST
saqut run file:fib.sqt → execute via IR + bytecode VM
```
Most compilers are black boxes. saQut is a **glass box.**
---
## What is it?
saQut is a **procedural language compiler** written in C++.
The language is small and C-flavoured on purpose — it is a vehicle, not the product.
The product is **a compilation pipeline where every stage is named, queryable, and machine-readable.**
You can pipe `saqut ast` into your own tool.
You can hand the optimized AST diff to a review script.
A stranger with no access to source could write an LSP from `saqut symbols` output alone.
That is the test saQut is designed to pass.
---
## The language looks like this
```c
int fibonacci(int n) {
if (n <= 1) {
return n;
}
return fibonacci(n - 1) + fibonacci(n - 2);
}
int fibonacciIterative(int n) {
int first = 0;
int second = 1;
for (int i = 0; i < n; i = i + 1) {
int next = first + second;
first = second;
second = next;
}
return first;
}
int main() {
int n = 10;
print(fibonacci(n));
print(fibonacciIterative(n));
return 0;
}
```
- No mandatory `class` / `main` boilerplate
- Typed functions, `struct`, `int[]` arrays
- `int`, `float`, `bool`, `string` literal types
- Value semantics — no user-visible pointers
- Single FFI seam (`callhost`) — the only door to the outside world
**Deliberately absent:** OOP, closures, generics, implicit int↔float coercion, `auto`.
---
## Build
**Requirements:** C++17, CMake ≥ 3.16, Ninja
```bash
git clone https://github.com/abdussamedulutas/saqut
cd saqut
cmake -B build -G Ninja
cmake --build build
```
Binary lands at `build/saqut`.
**Tested on:** Linux (x86-64, Manjaro). macOS and Windows untested but no platform-specific code.
---
## CLI
| Command | What you get |
|---|---|
| `saqut tokens file:src.sqt` | Token stream with positions |
| `saqut ast file:src.sqt` | Full AST as JSON |
| `saqut ast file:src.sqt --optimized` | AST after constant folding + dead-code elimination |
| `saqut symbols file:src.sqt` | Symbol table dump |
| `saqut ir file:src.sqt` | IR instruction dump |
| `saqut run file:src.sqt` | Compile and run via bytecode VM |
Every output is designed to be piped, diffed, or consumed by other tools.
---
## Pipeline
```
Source
│ Lexer + Tokenizer
Tokens ──────────────────── saqut tokens
│ Pratt parser + recursive descent
AST ─────────────────────── saqut ast
│ Symbol collector (two-pass)
Symbol Table ────────────── saqut symbols
│ Type checker + structural validator
Annotated AST
│ Optimization Manager (clone — original untouched)
│ ├─ Constant Folding pass
│ └─ Dead Code Elimination pass
Optimized AST ───────────── saqut ast --optimized
│ IR Generator
IR ──────────────────────── saqut ir
│ Bytecode VM (interpreter loop)
Output ──────────────────── saqut run
```
The optimizer works on a **clone** of the AST — the original is preserved.
Constant folding and DCE run in a fixpoint loop until nothing changes.
---
## What works right now
| Stage | Status |
|---|---|
| Lexer / Tokenizer | ✅ |
| Pratt parser | ✅ |
| AST + JSON serialization | ✅ |
| Symbol table (two-pass collector) | ✅ |
| Type checker | ✅ |
| Structural validator | ✅ |
| Constant folding (int, bool, logical, unary) | ✅ |
| Dead code elimination | ✅ |
| IR generator + bytecode VM | ✅ |
| `saqut run` executes fibonacci | ✅ |
| `string` type | ✅ |
| `struct` | 🚧 |
| `int[]` arrays | 🚧 |
| Standard library / FFI beyond `print` | 🚧 |
---
## Philosophy in two sentences
**Glass:** every compilation stage is a stable, queryable output — tokens, AST, symbols, IR — all separately inspectable and pipeable.
**Cage:** no user pointers, value semantics, single FFI door — the VM is deterministic, which makes record-replay and time-travel debugging a natural extension, not an afterthought.
The long version is in [`docs/architecture.md`](docs/architecture.md).
---
## Design records
Architectural decisions live in `docs/`:
| File | Coverage |
|---|---|
| [`docs/fikirler.md`](docs/fikirler.md) | ADR-001005: backend strategy, parser, header-only, token, IR |
| [`docs/adr-frontend-analiz.md`](docs/adr-frontend-analiz.md) | ADR-006019: analysis, optimization, execution model, FFI, memory |
| [`docs/roadmap-frontend.md`](docs/roadmap-frontend.md) | Phase-by-phase implementation plan |
| [`docs/architecture.md`](docs/architecture.md) | Full architecture reference (Turkish) |
---
## License
Source-available, commercial use restricted.
Free for: personal use, learning, writing and running saQut programs, internal tooling.
Requires permission for: hosting as a service, embedding sub-components commercially, redistributing as a product.
See [`LICENSE.md`](LICENSE.md) for the full terms.
Commercial licensing: saqutsoftware+gitea@gmail.com