07 - Macros! - CIS 198

# Macros!

### CIS 198 Lecture 7

---
## What Are Macros?

- In C, a macro looks like this:

```c
    #define FOO 10  // untyped integral constant
    #define SUB(x, y) ((x) - (y))  // parentheses are important!
    #define BAZ a  // relies on there being an `a` in context!

int a = FOO;
    short b = FOO;
    int c = -SUB(2, 3 + 4);
    int d = BAZ;
    ```

- Before compilation, a _preprocessor_ runs which directly substitutes tokens:

```c
    int a = 10;               // = 10
    short b = 10;             // = 10
    int c = -((2) - (3 + 4)); // = -(2 - 7) = 5
    int d = a;                // = 10
    ```

---
## Why C Macros Suck;

- C does a direct _token-level_ substitution.
    - e.g. `3+4` has the tokens `'3' '+' '4'`.
    - The preprocessor has no idea what variables, types, operators, numbers, or
      anything else actually _mean_.

- Say we had defined `SUB` like this:

```c
    #define SUB(x, y) x - y

int c = -SUB(2, 3 + 4);
    ```

---
## Why C Macros Suck¹

- Say we had defined `SUB` like this:

```c
    #define SUB(x, y) x - y

int c = -SUB(2, 3 + 4);
    ```

- This would break terribly! After substitution...

```c
    int c = -2 - 3 + 4;  // = -1, not 5.
    ```

¹ [GCC Docs: Macro Pitfalls](https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html)

---
## Why C Macros Suck

- Or if we directly used a variable name in a macro...

```c
    #define FOO 10
    #define BAZ a  // relies on there being an `a` in context!

int a_smith = FOO;
    int d = BAZ;
    ```

- Now, the preprocessor produces this invalid code...

```c
    int a_smith = 10;  // = 10
    int d = a;         // error: `a` is undeclared
    ```
---
## Why C Macros Suck

- C macros also can't be recursive:

```c
    #define foo (4 + foo)

int x = foo;
    ```

- This expands to

```c
    int x = 4 + foo;
    ```

- (This particular example is silly, but recursion _is_ useful.)

---
## C Macro Processing

1. Lexing: source code to token stream
```C
#define FOO 4
3+FOO
```
    - '#define' 'FOO' '4'
    - '3' '+' 'FOO'
2. Preprocessor: macros are expanded.
    - '3' '+' 'FOO' => '3' '+' '4'
3. Tokens are processed into an AST

![](img/34.png)

4. Rest of compilation
    - syntax checking, type checking, etc

---
## Rust Macros

- A Rust macro looks like this:

```rust
    macro_rules! incr {  // define the macro
        // syntax pattern => replacement code
        ( $x:ident ) => { $x += 1; };
    }

let mut x = 0;
    incr!(x);  // invoke a syntax extension (or macro)
    ```

---
## AST

- An AST (abstract syntax tree) contains semantic information about what kind of
  operation the syntax represents.
- `a + b + ( c + d [ 0 ] ) + e` parses into the AST

![](img/ast_bigger.png)

---

## Token Trees

- In Rust, the lexer produces token _trees_:
    - An intermediate between token streams and ASTs.
    - Only represents brace-based nested structures ('(', '[',
      '{').
- `a + b + ( c + d [ 0 ] ) + e` parses into the token _stream_
      - `'a' '+' 'b' '+' '(' 'c' '+' 'd' '[' '0' ']' ')' '+' 'e'`
- Which then parses into the token _tree_

![](img/tt.png)

---
## Macro Rules

- Put simply, a macro is just a compile-time pattern match:

```rust
    macro_rules! mymacro {
        ($pattern1) => {$expansion1};
        ($pattern2) => {$expansion2};
        // ...
    }
    ```

- e.g. this `four!` macro is super simple:

```rust
    macro_rules! four {
        // For empty input, produce `1 + 3` as output.
        () => {1 + 3};
    }
    ```
    - Matches on no patterns, and always expands to `{ 1 + 3 }`.

---
## Macro Expansion

```rust
let eight = 2 * four!();
```

- This parses into the following AST:

![](img/four-ast.png)

---
## Expansions

- Translated into Rust syntax, this looks like:

```rust
let eight = 2 * (1 + 3);
```

- With _this_ AST:

![](img/four-ast-expanded.png)

---
## Example: Try!

### https://is.gd/oMi3j

---
## Entire Process

1. Lexing: Source code to token _stream_ to token _tree_
2. Parsing: Token tree to AST
    - Macros are each a single AST node which contain a token tree
3. Macro expansion:
    - Macro AST nodes replaced by their expanded AST nodes
4. Rest of compilation (type checking, borrow checking, etc)

- Thus, macros must output valid, contextually-correct Rust.
    - Enforced because AST context is already known when expansion takes place.

---
## Expansions

- Macro calls can appear in place of the following
  syntax kinds, by outputting a valid AST of that kind:
    - Patterns (e.g. in a `match` or `if let`).
    - Statements (e.g. `let x = 4;`).
    - Expressions (e.g. `x * (y + z)`).
    - Items (e.g. `fn`, `struct`, `impl`, `use`).

- They _cannot_ appear in place of:
    - Identifiers, match arms, struct fields, or types.

---
## Macro Rules

- Any valid Rust tokens can appear in the match:

```rust
    macro_rules! imaginary {
        (elevensies) => {"20ton"};
    }

imaginary!(elevensies);
    imaginary!(twentington);
    ```

- In Rust, macros see _one_ token tree as input.
    - When you do `println!("{}", (5+2))`, the `"{}", (5+2)` will
      get parsed into a token tree, but _not_ fully parsed into an AST.
- The token tree can have any set of braces:

```rust
    macro_rules! imaginary {
        (elevensies) => {"20ton"};
    }

imaginary![elevensies];
    ```

---
## Captures

- Portions of the input token tree can be _captured_:

```rust
    macro_rules! sub {
        ($e1:expr, $e2:expr) => { ... };
    }
    ```

- Captures are always written as `$name:kind`.
    - Possible kinds are:
        - `item`: an item, like a function, struct, module, etc.
        - `block`: a block (i.e. `{ some; stuff; here }`)
        - `stmt`: a statement
        - `pat`: a pattern
        - `expr`: an expression
        - `ty`: a type
        - `ident`: an identifier
        - `path`: a path (e.g. `foo`, `::std::mem::replace`, ...)
        - `meta`: a meta item; the things that go inside `#[...]`
        - `tt`: a single token tree

---
## Captures

- Captures can be substituted back into the expanded tree

```rust
    macro_rules! sub {
        ( $e1:expr , $e2:expr ) => { $e1 - $e2 };
    }
    ```

- A capture will always be inserted as a **single** AST node.
    - For example, `expr` will always mean a valid Rust expression.
    - This means we're no longer vulnerable to C's substitution
      problem (the invalid order of operations).
    - Multiple expansions will still cause multiple evaluations:

```rust
    macro_rules! twice {
        ( $e:expr ) => { { $e; $e } }
    }

fn foo() { println!("foo"); }

twice!(foo());  // expands to { foo(); foo() }: prints twice
    ```

---
## Repetitions

- If we want to match a list, a variable number of arguments, etc.,
  we can't do this with the rules we've seen so far.
- _Repetitions_ allow us to define repeating subpatterns.
- These have the form `$ ( ... ) sep rep`.
    - `$` is a literal dollar token.
    - `( ... )` is the paren-grouped pattern being repeated.
    - `sep` is an *optional* separator token.
        - Usually, this will be `,` or `;`.
    - `rep` is the *required* repeat control. This can be either:
        - `*` zero or more repeats.
        - `+` one or more repeats.
- The same pattern is used in the output arm.
    - The separator doesn't have to be the same.

---
## Repetitions

- Takes a space-delimited list and prints each element.

```rust
macro_rules! repeat {
  ( $( $elem:expr )-* ) => {
    $( println!("{}", $elem) );*
  }
}

repeat!(1 2 3);
```

???

Notice you can insert a separator in the repetition- expansion (although in this
case it can go inside the expanded-expressoin as well)

---
## Exercise! vec

- Implement `vec!`
    - Takes zero-or-more elements and inserts them into a vector.

```rust
macro_rules! vec! {
    // ...
}
```

---
## Exercise! vec

```rust
macro_rules! vec {
    ( $( $elem:expr ),* ) => {
        { // Braces so we output only one AST (block kind)
            let mut v = Vec::new();

$(                 // Same syntax to expand a repetition
                v.push($elem); // Expands once for each input rep
             ) *               // No sep; zero or more reps

v                  // Return v from the block.
        }
    }
}
println!("{:?}", vec![3, 4]);
```

---
## Exercise! vec

- Condensed:

```rust
macro_rules! myvec {
    ( $( $elem:expr ),* ) => {
        {
            let mut v = Vec::new();
            $( v.push($elem); )*
            v
        }
    }
}
println!("{:?}", myvec![3, 4]);
```

---
## Exercise! csv

- Take a list of variable names (identifiers) and print them in CSV format.
    - One line, each element separated by commas.

```rust
macro_rules! csv {
    // ...
}
```

---
## Exercise! csv

- Take a list of variable names (identifiers) and print them in CSV format.
    - One line, each element separated by commas.

```rust
macro_rules! csv {
    // ...
}
```

- There's no way to count the number of captures in a repetition...

---
## Counting

- But you can use a repetition twice in a macro expansion:
    - Once to generate the format string
    - Once to insert arguments
- Or you can iteratively call `print!`, then print a newline at the end.

---

## Counting

- You can try to count repetition matches by generating an arithmetic
  expression:

```rust
macro_rules! replace_expr {
    ($_t:tt $sub:expr) => {$sub};
}

macro_rules! count_tts {
    ($($tts:tt)*) => {0 $(+ replace_expr!($tts 1))*};
}
```

- You need `replace_expr!` because you can't expand a repetition match without
  invoking the repeating variable.
- This works, but only up to values of 500 or so...
- This expands to the token stream `0 + 1 + ... + 1`, which must be parsed into
  an AST.

---
## Counting

```rust
macro_rules! count_tts {
    () => {0usize};
    ($_head:tt $($tail:tt)*) => {1usize + count_tts!($($tail)*)};
}
```

- Rust allows macro recursion (or nested macros)
- The compiler will keep expanding macros until there are none left in the AST
  (or the recursion limit is hit).
- The compiler's recursion limit can be changed with
  `#![recursion_limit="64"]`.
    - 64 is the default.
    - This applies to all recursive compiler operations, including
      auto-dereferencing and macro expansion.
---
## Counting

- The _fastest_ way is to count by creating an array literal, and evaluating
  `len` on it.
- Because of the way this is parsed, it doesn't create an _unbalanced_ AST like
  the arithmetic expression does.
- Tested to work with about 10k tokens.

---
## Matching

- Macro rules are matched in order.
- The parser can never backtrack. Say we have:

```rust
    macro_rules! dead_rule {
        ($e:expr) => { ... };
        ($i:ident +) => { ... };
    }
    ```

- If we call it as `dead_rule(x +);`, it will actually fail.
    - `x +` isn't a valid expression, so we might think it
      would fail on the first match and then try again on the second.
    - This doesn't happen!
    - Instead, since it _starts_ out looking like an expression,
      it commits to that match case.
        - When it turns out not to work, it can't _backtrack_
          on what it's parsed already, to try again.
          Instead it just fails.

---
## Matching

- To solve this, we need to put more specific rules first:

```rust
    macro_rules! dead_rule {
        ($i:ident +) => { ... };
        ($e:expr) => { ... };
    }
    ```

- Now, when we call `dead_rule!(x +);`, the first case will match.
- If we called `dead_rule!(x + 2);`, we can now fall through to the
  second case.
    - Why does this work?
    - Because if we've seen `$i:ident +`, the parser already knows
      that this looks like the beginning of an expression, so it can
      fall through to the second case.

---
## Macro Hygiene

- In C, we talked about how a macro can implicitly use (or conflict)
  with an identifier name in the calling context (`#define BAZ a`).

- Rust macros are _partially hygenic_.
    - Hygenic with regard to most identifiers.
        - These identifiers get a special context internal to
          the macro expansion.
    - NOT hygenic: generic types (`<T>`), lifetime parameters (`<'a>`).

```rust
    macro_rules! using_a {
        ($e:expr) => {     { let a = 42;  $e }  }
    } // Note extra braces ^                 ^

let four = using_a!(a / 10); // this won't compile - nice!
    ```

- We can imagine that this expands to something like:

```rust
    let four = { let using_a_1232424_a = 42; a / 10 };
    ```

---
## Macro Hygiene

- But if we _want_ to bind a new variable, it's possible.
    - If a token comes in as an input to the function, then it is
      part of the caller's context, not the macro's context.

```rust
    macro_rules! using_a {
        ($a:ident, $e:expr) => {  { let $a = 42;  $e }  }
    }        // Note extra braces ^                  ^

let four = using_a!(a, a / 10); // compiles!
    ```

- This expands to:

```rust
    let four = { let a = 42; a / 10 };
    ```

---
## Macro Hygiene

- It's also possible to create identifiers that will be visible
  outside of the macro call.
    - This won't work due to hygiene:

```rust
    macro_rules! let_four {
        () => { let four = 4; }
    }       // ^ No extra braces

let_four!();
    println!("{}", four); // `four` not declared
    ```
    - But this will:

```rust
    macro_rules! let_four {
        ($i:ident) => { let $i = 4; }
    }               // ^ No extra braces

let_four!(myfour);
    println!("{}", myfour); // works!
    ```

---
## Macro Debugging

- Rust has an unstable feature for debugging macro expansion.
    - Especially recursive macro expansions.

```rust
    #![feature(trace_macros)]
    macro_rules! each_tt {
        () => {};
        ( $_tt:tt $($rest:tt)* ) => { each_tt!( $($rest)* ); };
    }

trace_macros!(true);
    each_tt!(spim wak plee whum);
    trace_macros!(false);
    ```

- This will cause the compiler to print:

```
    each_tt! { spim wak plee whum }
    each_tt! { wak plee whum }
    each_tt! { plee whum }
    each_tt! { whum }
    each_tt! {  }
    ```

- More tips on macro debugging in [TLBORM 2.3.4](https://danielkeep.github.io/tlborm/book/mbe-min-debugging.html)

---
## Macro Scoping

- Macro scoping is unlike everything else in Rust.
    - Macros are immediately visible in submodules:

```rust
    macro_rules! X { () => {}; }

mod a {  // Or `mod a` could be in `a.rs`.
        X!(); // valid
    }
    ```

- Macros are only defined _after_ they appear in a module:

```rust
    mod a { /* X! undefined here */ }

mod b {
        /* X! undefined here */
        macro_rules! X { () => {}; }
        X!(); // valid
    }

mod c { /* X! undefined */ } // They don't leak between mods.
    ```

---
## Macro Scoping

- Macros can be exported from modules:

```rust
    #[macro_use]  // outside of the module definition
    mod b {
        macro_rules! X { () => {}; }
    }

mod c {
        X!(); // valid
    }
    ```

- Or from crates, using `#[macro_export]` in the crate.

---
## Macro Callbacks

- Because of the way macros are expanded, "obviously correct"
  macro invocations like this won't actually work:

```rust
    macro_rules! expand_to_larch {
        () => { larch };
    }

macro_rules! recognise_tree {
        (larch) => { println!("larch") };
        (redwood) => { println!("redwood") };
        ($($other:tt)*) => { println!("dunno??") };
    }

recognise_tree!(expand_to_larch!());
    ```

- This will be expanded like so:

```rust
    -> recognize_tree!{ expand_to_larch ! ( ) };
    -> println!("dunno??");
    ```

- Which will match the third pattern, not the first.

---
## Macro Callbacks

- This can make it hard to split a macro into several parts.
    - This isn't always a problem - `expand_to_larch ! ( )` won't match
      an `ident`, but it _will_ match an `expr`.

- The problem can be worked around by using a _callback_ pattern:

```rust
    macro_rules! call_with_larch {
        ($callback:ident) => { $callback!(larch) };
    }

call_with_larch!(recognize_tree);
    ```

- This expands like this:

```rust
    -> call_with_larch! { recognise_tree }
    -> recognise_tree! { larch }
    -> println!("larch");
    ```

---
## Macro TT Munchers

- This is one of the most powerful and useful macro design patterns.
  It allows for parsing fairly complex grammars.

- A _tt muncher_ is a macro which matches a bit at the beginning of
  its input, then recurses on the remainder of the input.
    - `( $some_stuff:expr $( $tail:tt )* ) =>`
    - Usually needed for any kind of actual language grammar.
    - Can only match against literals and grammar constructs
      which can be captured by `macro_rules!`.
    - Cannot match unbalanced groups.

---
## Macro TT Munchers

```rust
macro_rules! mixed_rules {
    () => {}; // Base case
    (trace $name:ident ; $( $tail:tt )*) => {
        {
            println!(concat!(stringify!($name), " = {:?}"), $name);
            mixed_rules!($($tail)*);  // Recurse on the tail of the input
        }
    };
    (trace $name:ident = $init:expr ; $( $tail:tt )*) => {
        {
            let $name = $init;
            println!(concat!(stringify!($name), " = {:?}"), $name);
            mixed_rules!($($tail)*);  // Recurse on the tail of the input
        }
    };
}
```

---
## Macros Rule! Mostly!

- Macros are pretty great - but not perfect.
    - Macro hygiene isn't perfect.
    - The scope of where you can use a macro is weird.
    - Handling crates inside of exported macros is weird.
    - It's impossible to construct entirely new identifiers
      (e.g. by concatenating two other identifiers).
    - ...
- A new, incompatible macro system may appear in future Rust.
    - This would be a new syntax for writing syntax extensions.

---
## Rust Macros from the Bottom Up

- Almost all material stolen from Daniel Keep's _excellent_ book:
    - [The Little Book of Rust Macros][tlborm] (TLBORM).
    - This section from [Chapter 2][tlborm-2].

[tlborm]: https://danielkeep.github.io/tlborm/book/
[tlborm-2]: https://danielkeep.github.io/tlborm/book/mbe-README.html

---
## Macros 1.1

- Procedural macros!
- In work for a long time.
- Pushed by demand from major crates like [Serde].

[serde]: https://github.com/serde-rs/serde

---
## Procedural Macros

- Procedural macros let you _execute code_ during compilation.
    - Write functions that take token streams and output token streams.
    - Of the type `proc_macro::TokenStream`
- Regular macros use pattern matching only.
- This already existed in the form of (nightly-only) syntax extensions.
- Macros 1.0 has an updated interface
    - tokens tream sonly, no interaction with the AST

---
## Custom Derive

- Allow you to write `derive` traits.
- For a trait `Foo`:
    - Write a function `fn(TokenStream) -> TokenStream` with an attribute
      #[proc_macro_derive(Foo)]
- Already implemented in Serde!