plait PyTorch for LLM Pipelines

Build, execute, and optimize compound AI systems with the familiar PyTorch programming model. Write normal Python, get automatic parallelism, and improve prompts through backward passes.

If you know PyTorch, you know plait

PyTorch Training Neural Networks
nn.Module Base class
forward() Define computation
backward() Propagate gradients
nn.Parameter Learnable weights
torch.fx.Tracer Capture graph
plait Training LLM Pipelines
Module Base class
forward() Define computation
backward() Propagate feedback
Parameter Learnable prompts
Tracer Capture graph

Why plait?

Most LLM applications are systems, not single calls. plait moves the complexity into a shared runtime.

Automatic Parallelism

Independent operations run concurrently without boilerplate. No manual asyncio.gather() required.

Automatic DAG Capture

Write normal Python code in forward(). Dependencies are discovered automatically via tracing.

Async-First Execution

Maximum throughput with adaptive backpressure, rate limiting, and automatic retries.

Runtime Optimization

Backward passes propagate feedback to improve prompts. Optimize your pipeline during execution.

Multi-Model Pipelines

Use different models for different steps. Alias-based configuration separates logic from deployment.

Learnable Parameters

Prompts and instructions as Parameter objects that can be optimized via feedback.

Framework Comparison

See how plait compares to other LLM frameworks on key features.

Feature plait Pydantic AI LangGraph DSPy
Graph definition Implicit (tracing) Explicit Explicit Implicit
Parallel execution Automatic Manual Explicit Sequential
Multi-model Alias-based Per-agent Per-node Global config
Learnable params Parameter class No No Compile-time
Optimization Runtime backward No No Compile-time
Execution model Async-first Async Async Sync-first
PyTorch-like API Yes No No No

Benchmark: Extract-and-Compare Pipeline

Real-world performance on a fan-out workflow: 2 parallel extractions (gpt-4o-mini) + 1 comparison (gpt-4o).

Execution Time

plait
6.9s
Pydantic AI
8.7s
+25%
LangGraph
10.1s
+45%
DSPy
13.4s
+94%

Memory Usage

plait
0.4 MB
Pydantic AI
17.6 MB
44x
LangGraph
26.2 MB
65x
DSPy
76.0 MB
190x

plait is up to 2x faster and uses up to 99% less memory than alternatives.

Parallelism: Write Sequential, Execute Parallel

class ExtractAndCompare(Module):
    def forward(self, doc1: str, doc2: str) -> str:
        # These two calls are INDEPENDENT - plait runs them in PARALLEL
        facts1 = self.extractor(doc1)
        facts2 = self.extractor(doc2)

        # Formatter is a traced sub-module
        prompt = self.formatter(facts1, facts2)
        return self.comparer(prompt)

# That's it! No async boilerplate, no explicit parallelism.
# The tracer discovers independence and parallelizes automatically.
async def extract_and_compare(doc1: str, doc2: str) -> str:
    # Must explicitly use asyncio.gather for parallelism
    facts1, facts2 = await asyncio.gather(
        extractor.run(doc1),
        extractor.run(doc2),
    )

    # Manual string formatting
    prompt = f"Compare:\n{facts1.data}\n\nvs:\n{facts2.data}"
    result = await comparer.run(prompt)
    return result.data

# Requires understanding of async/await patterns.
# Parallelism must be manually identified and implemented.
def fan_out(state: State) -> list[Send]:
    # Must define explicit fan-out function
    return [
        Send("extract", {"doc": state["doc1"], "key": "facts1"}),
        Send("extract", {"doc": state["doc2"], "key": "facts2"}),
    ]

def reduce(results: list[str]) -> str:
    # Must define reducer to collect parallel results
    return "\n\n".join(results)

graph.add_conditional_edges(START, fan_out)
graph.add_node("extract", extract_node)
graph.add_node("compare", compare_node)

# Requires explicit graph construction with Send() and reducers.
# More boilerplate for simple parallel patterns.
class ExtractAndCompare(dspy.Module):
    def forward(self, doc1: str, doc2: str) -> str:
        # Runs sequentially - no built-in parallelism
        facts1 = self.extract(document=doc1).facts  # Runs first
        facts2 = self.extract(document=doc2).facts  # Then second

        # Compare after both complete
        result = self.compare(facts1=facts1, facts2=facts2)
        return result.comparison

# DSPy is sync-first. These calls execute one after another.
# No automatic parallelism for independent operations.

See It In Action

A complete pipeline showing automatic parallelism, multi-model configuration, and learnable parameters.

extract_and_compare.py
from plait import Module, LLMInference, Parameter
from plait.resources import OpenAIEndpointConfig, ResourceConfig


class CompareFormatter(Module):
    """Format two fact lists for comparison (traced sub-module)."""

    def forward(self, facts1: str, facts2: str) -> str:
        return f"Compare:\n{facts1}\n\nvs:\n{facts2}"


class ExtractAndCompare(Module):
    """Extract facts from two documents and compare them."""

    def __init__(self):
        super().__init__()
        # Learnable parameter - can be optimized via backward pass
        self.comparison_style = Parameter(
            value="Highlight key similarities and differences.",
            description="Controls the style of comparison output.",
        )
        self.extractor = LLMInference(
            alias="fast",
            system_prompt="Extract the main facts as a bulleted list.",
        )
        self.formatter = CompareFormatter()  # Traced sub-module
        self.comparer = LLMInference(
            alias="smart",
            system_prompt=self.comparison_style,
        )

    def forward(self, doc1: str, doc2: str) -> str:
        # These two calls are INDEPENDENT - plait runs them in PARALLEL
        facts1 = self.extractor(doc1)
        facts2 = self.extractor(doc2)

        # Formatter is a sub-module - participates in tracing
        prompt = self.formatter(facts1, facts2)
        return self.comparer(prompt)


# Configure endpoints separately from module definition
resources = ResourceConfig(
    endpoints={
        "fast": OpenAIEndpointConfig(model="gpt-4o-mini", max_concurrent=20),
        "smart": OpenAIEndpointConfig(model="gpt-4o", max_concurrent=5),
    }
)

# Bind resources and execute
pipeline = ExtractAndCompare().bind(resources=resources)
result = await pipeline(doc1, doc2)
1

Learnable Parameters

Prompts stored as Parameter can be optimized through backward passes with feedback.

2

Automatic Parallelism

Independent calls to extractor are traced and executed concurrently - no async boilerplate needed.

3

Multi-Model Configuration

Use aliases like "fast" and "smart" to decouple module logic from endpoint configuration.

4

Resource Binding

Same pipeline can run against different endpoints - dev, prod, or self-hosted models.

Execution DAG

The tracer captures the dependency graph from your forward() method. Independent branches execute in parallel automatically. Sub-modules like formatter participate in tracing.

Input
extractor(doc1)
extractor(doc2)
formatter
comparer
Output
Runs in parallel

Get Started in Seconds

uv add pyplait

Published as pyplait on PyPI, import as plait

Requires Python 3.13+