Compilation Pipeline Specification
This document defines the complete compilation pipeline for the Phaser programming language, detailing the five mandatory phases, their interfaces, data flow, and coordination mechanisms.
This is a compiler implementation document. For language design and user-facing features, see the docs directory.
Overview
The Phaser compiler implements a strict 5-phase pipeline that processes source code through distinct, well-defined stages. Each phase has clear responsibilities, inputs, and outputs, ensuring separation of concerns and enabling incremental compilation.
Pipeline Architecture
--- title: Compilation phases config: layout: dagre --- flowchart TB A["Source Code (.ph files)"] --> B("Lexer") B -- Token Stream --> C("Parser") C -- Abstract Syntax Tree (AST) --> D("Semantic Analysis") D -- Semantically Analyzed AST + Symbol Tables --> E("Comptime") E -- "AST with Compile-time Evaluation Results" --> F("Codegen") F -- "Target Code (Assembly, LLVM IR, etc.)" --> G["Target Output"]
Phase Interfaces
Phase 1: Lexical Analysis
Input: Raw source text (&str or SourceFile)
Output: Token stream (TokenStream)
pub struct LexerPhase;
impl CompilerPhase for LexerPhase {
type Input = SourceFile;
type Output = TokenStream;
type Error = LexicalError;
fn execute(&mut self, input: Self::Input) -> PhaserResult<Self::Output> {
let mut lexer = Lexer::new(input);
lexer.tokenize()
}
}
pub struct TokenStream {
pub tokens: Vec<Token>,
pub source_map: SourceMap,
pub errors: Vec<PhaserError>,
}Responsibilities:
- Convert source text into tokens
- Handle string and numeric literal parsing
- Track source positions for error reporting
- Process comments and whitespace as trivia
- Detect and report lexical errors
Phase 2: Syntactic Analysis
Input: Token stream (TokenStream)
Output: Abstract Syntax Tree (Program)
pub struct ParserPhase;
impl CompilerPhase for ParserPhase {
type Input = TokenStream;
type Output = ParseResult;
type Error = SyntaxError;
fn execute(&mut self, input: Self::Input) -> PhaserResult<Self::Output> {
let mut parser = Parser::new(input);
parser.parse_program()
}
}
pub struct ParseResult {
pub program: Program,
pub errors: Vec<PhaserError>,
pub warnings: Vec<PhaserError>,
}Responsibilities:
- Build Abstract Syntax Tree from tokens
- Implement error recovery for continued parsing
- Validate syntactic correctness
- Handle operator precedence and associativity
- Preserve source location information in AST nodes
Phase 3: Semantic Analysis
Input: AST (Program)
Output: Analyzed AST with symbol tables (AnalysisResult)
pub struct AnalysisPhase;
impl CompilerPhase for AnalysisPhase {
type Input = ParseResult;
type Output = AnalysisResult;
type Error = SemanticError;
fn execute(&mut self, input: Self::Input) -> PhaserResult<Self::Output> {
let mut analyzer = SemanticAnalyzer::new();
analyzer.analyze(input.program)
}
}
pub struct AnalysisResult {
pub program: Program,
pub symbol_table: SymbolTable,
pub type_table: TypeTable,
pub dependency_graph: DependencyGraph,
pub errors: Vec<PhaserError>,
pub warnings: Vec<PhaserError>,
}Responsibilities:
- Name resolution and scope analysis
- Type checking and inference
- Borrow checking and lifetime analysis
- Dead code detection
- Dependency analysis
- Build symbol and type tables
Phase 4: Compile-time Evaluation
Input: Analyzed AST (AnalysisResult)
Output: AST with comptime results (ComptimeResult)
pub struct ComptimePhase;
impl CompilerPhase for ComptimePhase {
type Input = AnalysisResult;
type Output = ComptimeResult;
type Error = ComptimeError;
fn execute(&mut self, input: Self::Input) -> PhaserResult<Self::Output> {
let mut evaluator = ComptimeEvaluator::new(input.symbol_table);
evaluator.evaluate(input.program)
}
}
pub struct ComptimeResult {
pub program: Program,
pub comptime_values: ComptimeValueTable,
pub generated_code: Vec<GeneratedItem>,
pub meta_expansions: Vec<MetaExpansion>,
pub errors: Vec<PhaserError>,
}Responsibilities:
- Execute compile-time expressions and functions
- Perform constant folding and propagation
- Execute meta-programming directives
- Generate code from templates
- Validate comptime constraints and limits
Phase 5: Code Generation
Input: Comptime-evaluated AST (ComptimeResult)
Output: Target code (CodegenResult)
pub struct CodegenPhase;
impl CompilerPhase for CodegenPhase {
type Input = ComptimeResult;
type Output = CodegenResult;
type Error = CodegenError;
fn execute(&mut self, input: Self::Input) -> PhaserResult<Self::Output> {
let mut codegen = CodeGenerator::new(self.target_config);
codegen.generate(input.program)
}
}
pub struct CodegenResult {
pub output: TargetOutput,
pub debug_info: DebugInfo,
pub metadata: CompilationMetadata,
}
pub enum TargetOutput {
Assembly(String),
LlvmIr(String),
ObjectCode(Vec<u8>),
Executable(Vec<u8>),
}Responsibilities:
- Generate target-specific code
- Perform target-specific optimizations
- Generate debug information
- Handle linking and symbol resolution
- Produce final executable or library
Pipeline Coordination
Compiler Driver
pub struct Compiler {
config: CompilerConfig,
source_manager: SourceManager,
error_reporter: ErrorReporter,
}
impl Compiler {
pub fn compile(&mut self, sources: Vec<SourceFile>) -> PhaserResult<CompilationResult> {
let mut results = Vec::new();
for source in sources {
let result = self.compile_single(source)?;
results.push(result);
}
self.link_results(results)
}
fn compile_single(&mut self, source: SourceFile) -> PhaserResult<ModuleResult> {
// Phase 1: Lexical Analysis
let mut lexer_phase = LexerPhase::new();
let tokens = lexer_phase.execute(source)?;
// Phase 2: Syntactic Analysis
let mut parser_phase = ParserPhase::new();
let parse_result = parser_phase.execute(tokens)?;
// Phase 3: Semantic Analysis
let mut analysis_phase = AnalysisPhase::new();
let analysis_result = analysis_phase.execute(parse_result)?;
// Phase 4: Compile-time Evaluation
let mut comptime_phase = ComptimePhase::new();
let comptime_result = comptime_phase.execute(analysis_result)?;
// Phase 5: Code Generation
let mut codegen_phase = CodegenPhase::new(self.config.target);
let codegen_result = codegen_phase.execute(comptime_result)?;
Ok(ModuleResult {
source_id: source.id,
output: codegen_result,
})
}
}Phase Interface Trait
pub trait CompilerPhase {
type Input;
type Output;
type Error: Into<PhaserError>;
fn execute(&mut self, input: Self::Input) -> PhaserResult<Self::Output>;
fn phase_name(&self) -> &'static str;
fn can_recover_from_error(&self) -> bool { false }
fn cleanup(&mut self) {}
}Data Flow and Dependencies
Inter-Phase Data
pub struct CompilationContext {
pub source_manager: SourceManager,
pub symbol_tables: HashMap<ModuleId, SymbolTable>,
pub type_tables: HashMap<ModuleId, TypeTable>,
pub dependency_graph: GlobalDependencyGraph,
pub comptime_cache: ComptimeCache,
pub target_config: TargetConfig,
}Incremental Compilation Support
pub struct IncrementalCompiler {
cache: CompilationCache,
dependency_tracker: DependencyTracker,
}
impl IncrementalCompiler {
pub fn compile_incremental(
&mut self,
changed_files: Vec<SourceFile>,
) -> PhaserResult<CompilationResult> {
// Determine which modules need recompilation
let affected_modules = self.dependency_tracker
.find_affected_modules(&changed_files);
// Recompile only affected modules
for module in affected_modules {
if !self.cache.is_valid(&module) {
self.recompile_module(module)?;
}
}
self.link_cached_results()
}
}Error Handling Across Phases
Error Propagation
pub struct PhaseError {
pub phase: PhaseName,
pub error: PhaserError,
pub can_continue: bool,
}
#[derive(Debug, Clone, PartialEq)]
pub enum PhaseName {
Lexer,
Parser,
Analysis,
Comptime,
Codegen,
}
impl Compiler {
fn handle_phase_error(&mut self, error: PhaseError) -> PhaserResult<()> {
self.error_reporter.report_error(&error.error);
if error.can_continue {
// Continue with next phase using error recovery
Ok(())
} else {
// Fatal error, stop compilation
Err(error.error)
}
}
}Error Recovery Strategies
- Lexer Recovery: Skip invalid characters, continue tokenizing
- Parser Recovery: Synchronize to statement boundaries, insert missing tokens
- Analysis Recovery: Use error types, continue with partial information
- Comptime Recovery: Skip failed evaluations, use default values
- Codegen Recovery: Generate placeholder code, emit warnings
Performance Optimization
Parallel Compilation
pub struct ParallelCompiler {
thread_pool: ThreadPool,
dependency_graph: DependencyGraph,
}
impl ParallelCompiler {
pub fn compile_parallel(&mut self, modules: Vec<Module>) -> PhaserResult<CompilationResult> {
let compilation_order = self.dependency_graph.topological_sort();
let mut futures = Vec::new();
for batch in compilation_order.into_batches() {
let batch_futures: Vec<_> = batch.into_iter()
.map(|module| self.compile_module_async(module))
.collect();
futures.extend(batch_futures);
}
// Wait for all compilations to complete
let results = futures::future::join_all(futures).await;
self.combine_results(results)
}
}Caching Strategy
pub struct CompilationCache {
lexer_cache: HashMap<SourceId, TokenStream>,
parser_cache: HashMap<SourceId, ParseResult>,
analysis_cache: HashMap<ModuleId, AnalysisResult>,
comptime_cache: HashMap<ComptimeKey, ComptimeValue>,
}
impl CompilationCache {
pub fn get_or_compute<T, F>(&mut self, key: &CacheKey, compute: F) -> PhaserResult<T>
where
F: FnOnce() -> PhaserResult<T>,
T: Clone + Serialize + DeserializeOwned,
{
if let Some(cached) = self.get(key) {
Ok(cached)
} else {
let result = compute()?;
self.insert(key.clone(), result.clone());
Ok(result)
}
}
}Configuration and Targets
Compiler Configuration
pub struct CompilerConfig {
pub target: TargetConfig,
pub optimization_level: OptimizationLevel,
pub debug_info: DebugInfoLevel,
pub warnings: WarningConfig,
pub features: FeatureConfig,
pub paths: PathConfig,
}
pub struct TargetConfig {
pub architecture: Architecture,
pub operating_system: OperatingSystem,
pub environment: Environment,
pub code_model: CodeModel,
}
#[derive(Debug, Clone, PartialEq)]
pub enum OptimizationLevel {
None, // -O0
Size, // -Os
Speed, // -O2
Aggressive, // -O3
}Multi-Target Support
pub trait CodegenBackend {
fn generate_code(&mut self, program: &Program) -> PhaserResult<TargetOutput>;
fn target_info(&self) -> &TargetInfo;
fn supported_features(&self) -> &[Feature];
}
pub struct LlvmBackend {
context: LlvmContext,
module: LlvmModule,
target_machine: TargetMachine,
}
pub struct NativeBackend {
assembler: Assembler,
linker: Linker,
}Testing and Validation
Pipeline Testing
#[cfg(test)]
mod pipeline_tests {
use super::*;
#[test]
fn test_complete_pipeline() {
let source = r#"
fn main() -> i32 {
let x = 42;
return x;
}
"#;
let mut compiler = Compiler::new(CompilerConfig::default());
let result = compiler.compile_string(source).unwrap();
assert!(result.is_success());
assert!(!result.output.is_empty());
}
#[test]
fn test_error_recovery() {
let source = r#"
fn main() {
let x = ; // Syntax error
let y = 42; // Should still compile
}
"#;
let mut compiler = Compiler::new(CompilerConfig::default());
let result = compiler.compile_string(source);
assert!(result.has_errors());
assert!(result.has_partial_output());
}
}Integration Testing
- End-to-end compilation tests
- Cross-phase error propagation tests
- Performance regression tests
- Memory usage validation
- Incremental compilation correctness
Future Extensions
Planned Enhancements
- Language Server Integration: Real-time compilation for IDE support
- Hot Reloading: Runtime code replacement for development
- Cross-Compilation: Support for multiple target platforms
- Plugin System: Extensible compilation phases
- Distributed Compilation: Network-based parallel compilation
- Advanced Caching: Persistent cross-session caching
- Profile-Guided Optimization: Runtime feedback for optimization
Extensibility Points
- Custom lexer extensions for domain-specific syntax
- Parser plugins for language extensions
- Analysis passes for custom linting rules
- Comptime function libraries
- Backend plugins for new target platforms