In C, a macro looks like this:
#define FOO 10 // untyped integral constant#define SUB(x, y) ((x) - (y)) // parentheses are important!#define BAZ a // relies on there being an `a` in context!int a = FOO;short b = FOO;int c = -SUB(2, 3 + 4);int d = BAZ;
Before compilation, a preprocessor runs which directly substitutes tokens:
int a = 10; // = 10short b = 10; // = 10int c = -((2) - (3 + 4)); // = -(2 - 7) = 5int d = a; // = 10
C does a direct token-level substitution.
3+4
has the tokens '3' '+' '4'
.Say we had defined SUB
like this:
#define SUB(x, y) x - yint c = -SUB(2, 3 + 4);
C does a direct token-level substitution.
3+4
has the tokens '3' '+' '4'
.Say we had defined SUB
like this:
#define SUB(x, y) x - yint c = -SUB(2, 3 + 4);
This would break terribly! After substitution...
int c = -2 - 3 + 4; // = -1, not 5.
Or if we directly used a variable name in a macro...
#define FOO 10#define BAZ a // relies on there being an `a` in context!int a_smith = FOO;int d = BAZ;
Now, the preprocessor produces this invalid code...
int a_smith = 10; // = 10int d = a; // error: `a` is undeclared
C macros also can't be recursive:
#define foo (4 + foo)int x = foo;
This expands to
int x = 4 + foo;
#define FOO 43+FOO
Tokens are processed into an AST
Rest of compilation
A Rust macro looks like this:
macro_rules! incr { // define the macro // syntax pattern => replacement code ( $x:ident ) => { $x += 1; };}let mut x = 0;incr!(x); // invoke a syntax extension (or macro)
a + b + ( c + d [ 0 ] ) + e
parses into the ASTa + b + ( c + d [ 0 ] ) + e
parses into the token stream'a' '+' 'b' '+' '(' 'c' '+' 'd' '[' '0' ']' ')' '+' 'e'
Put simply, a macro is just a compile-time pattern match:
macro_rules! mymacro { ($pattern1) => {$expansion1}; ($pattern2) => {$expansion2}; // ...}
e.g. this four!
macro is super simple:
macro_rules! four { // For empty input, produce `1 + 3` as output. () => {1 + 3};}
{ 1 + 3 }
.let eight = 2 * four!();
let eight = 2 * (1 + 3);
Macro calls can appear in place of the following syntax kinds, by outputting a valid AST of that kind:
match
or if let
).let x = 4;
).x * (y + z)
).fn
, struct
, impl
, use
).They cannot appear in place of:
Any valid Rust tokens can appear in the match:
macro_rules! imaginary { (elevensies) => {"20ton"};}imaginary!(elevensies);imaginary!(twentington);
In Rust, macros see one token tree as input.
println!("{}", (5+2))
, the "{}", (5+2)
will
get parsed into a token tree, but not fully parsed into an AST.The token tree can have any set of braces:
macro_rules! imaginary { (elevensies) => {"20ton"};}imaginary![elevensies];
Portions of the input token tree can be captured:
macro_rules! sub { ($e1:expr, $e2:expr) => { ... };}
Captures are always written as $name:kind
.
item
: an item, like a function, struct, module, etc.block
: a block (i.e. { some; stuff; here }
)stmt
: a statementpat
: a patternexpr
: an expressionty
: a typeident
: an identifierpath
: a path (e.g. foo
, ::std::mem::replace
, ...)meta
: a meta item; the things that go inside #[...]
tt
: a single token treeCaptures can be substituted back into the expanded tree
macro_rules! sub { ( $e1:expr , $e2:expr ) => { $e1 - $e2 };}
A capture will always be inserted as a single AST node.
expr
will always mean a valid Rust expression.macro_rules! twice { ( $e:expr ) => { { $e; $e } }}fn foo() { println!("foo"); }twice!(foo()); // expands to { foo(); foo() }: prints twice
$ ( ... ) sep rep
.$
is a literal dollar token.( ... )
is the paren-grouped pattern being repeated.sep
is an optional separator token.,
or ;
.rep
is the required repeat control. This can be either:*
zero or more repeats.+
one or more repeats.macro_rules! repeat { ( $( $elem:expr )-* ) => { $( println!("{}", $elem) );* }}repeat!(1 2 3);
Notice you can insert a separator in the repetition- expansion (although in this case it can go inside the expanded-expressoin as well)
vec!
macro_rules! vec! { // ...}
macro_rules! vec { ( $( $elem:expr ),* ) => { { // Braces so we output only one AST (block kind) let mut v = Vec::new(); $( // Same syntax to expand a repetition v.push($elem); // Expands once for each input rep ) * // No sep; zero or more reps v // Return v from the block. } }}println!("{:?}", vec![3, 4]);
macro_rules! myvec { ( $( $elem:expr ),* ) => { { let mut v = Vec::new(); $( v.push($elem); )* v } }}println!("{:?}", myvec![3, 4]);
macro_rules! csv { // ...}
macro_rules! csv { // ...}
print!
, then print a newline at the end.macro_rules! replace_expr { ($_t:tt $sub:expr) => {$sub};}macro_rules! count_tts { ($($tts:tt)*) => {0 $(+ replace_expr!($tts 1))*};}
replace_expr!
because you can't expand a repetition match without
invoking the repeating variable.0 + 1 + ... + 1
, which must be parsed into
an AST.macro_rules! count_tts { () => {0usize}; ($_head:tt $($tail:tt)*) => {1usize + count_tts!($($tail)*)};}
#![recursion_limit="64"]
.len
on it.The parser can never backtrack. Say we have:
macro_rules! dead_rule { ($e:expr) => { ... }; ($i:ident +) => { ... };}
If we call it as dead_rule(x +);
, it will actually fail.
x +
isn't a valid expression, so we might think it
would fail on the first match and then try again on the second.To solve this, we need to put more specific rules first:
macro_rules! dead_rule { ($i:ident +) => { ... }; ($e:expr) => { ... };}
Now, when we call dead_rule!(x +);
, the first case will match.
dead_rule!(x + 2);
, we can now fall through to the
second case.$i:ident +
, the parser already knows
that this looks like the beginning of an expression, so it can
fall through to the second case.In C, we talked about how a macro can implicitly use (or conflict)
with an identifier name in the calling context (#define BAZ a
).
Rust macros are partially hygenic.
<T>
), lifetime parameters (<'a>
).macro_rules! using_a { ($e:expr) => { { let a = 42; $e } }} // Note extra braces ^ ^let four = using_a!(a / 10); // this won't compile - nice!
let four = { let using_a_1232424_a = 42; a / 10 };
But if we want to bind a new variable, it's possible.
macro_rules! using_a { ($a:ident, $e:expr) => { { let $a = 42; $e } }} // Note extra braces ^ ^let four = using_a!(a, a / 10); // compiles!
let four = { let a = 42; a / 10 };
It's also possible to create identifiers that will be visible outside of the macro call.
macro_rules! let_four { () => { let four = 4; }} // ^ No extra braceslet_four!();println!("{}", four); // `four` not declared
macro_rules! let_four { ($i:ident) => { let $i = 4; }} // ^ No extra braceslet_four!(myfour);println!("{}", myfour); // works!
Rust has an unstable feature for debugging macro expansion.
#![feature(trace_macros)]macro_rules! each_tt { () => {}; ( $_tt:tt $($rest:tt)* ) => { each_tt!( $($rest)* ); };}trace_macros!(true);each_tt!(spim wak plee whum);trace_macros!(false);
each_tt! { spim wak plee whum }each_tt! { wak plee whum }each_tt! { plee whum }each_tt! { whum }each_tt! { }
More tips on macro debugging in TLBORM 2.3.4
Macro scoping is unlike everything else in Rust.
macro_rules! X { () => {}; }mod a { // Or `mod a` could be in `a.rs`. X!(); // valid}
mod a { /* X! undefined here */ }mod b { /* X! undefined here */ macro_rules! X { () => {}; } X!(); // valid}mod c { /* X! undefined */ } // They don't leak between mods.
Macros can be exported from modules:
#[macro_use] // outside of the module definitionmod b { macro_rules! X { () => {}; }}mod c { X!(); // valid}
#[macro_export]
in the crate.Because of the way macros are expanded, "obviously correct" macro invocations like this won't actually work:
macro_rules! expand_to_larch { () => { larch };}macro_rules! recognise_tree { (larch) => { println!("larch") }; (redwood) => { println!("redwood") }; ($($other:tt)*) => { println!("dunno??") };}recognise_tree!(expand_to_larch!());
-> recognize_tree!{ expand_to_larch ! ( ) };-> println!("dunno??");
This can make it hard to split a macro into several parts.
expand_to_larch ! ( )
won't match
an ident
, but it will match an expr
.The problem can be worked around by using a callback pattern:
macro_rules! call_with_larch { ($callback:ident) => { $callback!(larch) };}call_with_larch!(recognize_tree);
-> call_with_larch! { recognise_tree }-> recognise_tree! { larch }-> println!("larch");
This is one of the most powerful and useful macro design patterns. It allows for parsing fairly complex grammars.
A tt muncher is a macro which matches a bit at the beginning of its input, then recurses on the remainder of the input.
( $some_stuff:expr $( $tail:tt )* ) =>
macro_rules!
.macro_rules! mixed_rules { () => {}; // Base case (trace $name:ident ; $( $tail:tt )*) => { { println!(concat!(stringify!($name), " = {:?}"), $name); mixed_rules!($($tail)*); // Recurse on the tail of the input } }; (trace $name:ident = $init:expr ; $( $tail:tt )*) => { { let $name = $init; println!(concat!(stringify!($name), " = {:?}"), $name); mixed_rules!($($tail)*); // Recurse on the tail of the input } };}
proc_macro::TokenStream
derive
traits.Foo
:fn(TokenStream) -> TokenStream
with an attributeIn C, a macro looks like this:
#define FOO 10 // untyped integral constant#define SUB(x, y) ((x) - (y)) // parentheses are important!#define BAZ a // relies on there being an `a` in context!int a = FOO;short b = FOO;int c = -SUB(2, 3 + 4);int d = BAZ;
Before compilation, a preprocessor runs which directly substitutes tokens:
int a = 10; // = 10short b = 10; // = 10int c = -((2) - (3 + 4)); // = -(2 - 7) = 5int d = a; // = 10
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |