plait - PyTorch for LLM Pipelines

Why plait?

Most LLM applications are systems, not single calls. plait moves the complexity into a shared runtime.

Automatic Parallelism

Independent operations run concurrently without boilerplate. No manual asyncio.gather() required.

Automatic DAG Capture

Write normal Python code in forward(). Dependencies are discovered automatically via tracing.

Async-First Execution

Maximum throughput with adaptive backpressure, rate limiting, and automatic retries.

Runtime Optimization

Backward passes propagate feedback to improve prompts. Optimize your pipeline during execution.

Multi-Model Pipelines

Use different models for different steps. Alias-based configuration separates logic from deployment.

Learnable Parameters

Prompts and instructions as Parameter objects that can be optimized via feedback.

Framework Comparison

See how plait compares to other LLM frameworks on key features.

Feature	plait	Pydantic AI	LangGraph	DSPy
Graph definition	Implicit (tracing)	Explicit	Explicit	Implicit
Parallel execution	Automatic	Manual	Explicit	Sequential
Multi-model	Alias-based	Per-agent	Per-node	Global config
Learnable params	Parameter class	No	No	Compile-time
Optimization	Runtime backward	No	No	Compile-time
Execution model	Async-first	Async	Async	Sync-first
PyTorch-like API	Yes	No	No	No

Benchmark: Extract-and-Compare Pipeline

Real-world performance on a fan-out workflow: 2 parallel extractions (gpt-4o-mini) + 1 comparison (gpt-4o).

Execution Time

plait

6.9s

Pydantic AI

8.7s

+25%

LangGraph

10.1s

+45%

DSPy

13.4s

+94%

Memory Usage

plait

0.4 MB

Pydantic AI

17.6 MB

44x

LangGraph

26.2 MB

65x

DSPy

76.0 MB

190x

plait is up to 2x faster and uses up to 99% less memory than alternatives.

Parallelism: Write Sequential, Execute Parallel

class ExtractAndCompare(Module):
    def forward(self, doc1: str, doc2: str) -> str:
        # These two calls are INDEPENDENT - plait runs them in PARALLEL
        facts1 = self.extractor(doc1)
        facts2 = self.extractor(doc2)

        # Formatter is a traced sub-module
        prompt = self.formatter(facts1, facts2)
        return self.comparer(prompt)

# That's it! No async boilerplate, no explicit parallelism.
# The tracer discovers independence and parallelizes automatically.

async def extract_and_compare(doc1: str, doc2: str) -> str:
    # Must explicitly use asyncio.gather for parallelism
    facts1, facts2 = await asyncio.gather(
        extractor.run(doc1),
        extractor.run(doc2),
    )

    # Manual string formatting
    prompt = f"Compare:\n{facts1.data}\n\nvs:\n{facts2.data}"
    result = await comparer.run(prompt)
    return result.data

# Requires understanding of async/await patterns.
# Parallelism must be manually identified and implemented.

def fan_out(state: State) -> list[Send]:
    # Must define explicit fan-out function
    return [
        Send("extract", {"doc": state["doc1"], "key": "facts1"}),
        Send("extract", {"doc": state["doc2"], "key": "facts2"}),
    ]

def reduce(results: list[str]) -> str:
    # Must define reducer to collect parallel results
    return "\n\n".join(results)

graph.add_conditional_edges(START, fan_out)
graph.add_node("extract", extract_node)
graph.add_node("compare", compare_node)

# Requires explicit graph construction with Send() and reducers.
# More boilerplate for simple parallel patterns.

class ExtractAndCompare(dspy.Module):
    def forward(self, doc1: str, doc2: str) -> str:
        # Runs sequentially - no built-in parallelism
        facts1 = self.extract(document=doc1).facts  # Runs first
        facts2 = self.extract(document=doc2).facts  # Then second

        # Compare after both complete
        result = self.compare(facts1=facts1, facts2=facts2)
        return result.comparison

# DSPy is sync-first. These calls execute one after another.
# No automatic parallelism for independent operations.

See It In Action

A complete pipeline showing automatic parallelism, multi-model configuration, and learnable parameters.

extract_and_compare.py

from plait import Module, LLMInference, Parameter
from plait.resources import OpenAIEndpointConfig, ResourceConfig


class CompareFormatter(Module):
    """Format two fact lists for comparison (traced sub-module)."""

    def forward(self, facts1: str, facts2: str) -> str:
        return f"Compare:\n{facts1}\n\nvs:\n{facts2}"


class ExtractAndCompare(Module):
    """Extract facts from two documents and compare them."""

    def __init__(self):
        super().__init__()
        # Learnable parameter - can be optimized via backward pass
        self.comparison_style = Parameter(
            value="Highlight key similarities and differences.",
            description="Controls the style of comparison output.",
        )
        self.extractor = LLMInference(
            alias="fast",
            system_prompt="Extract the main facts as a bulleted list.",
        )
        self.formatter = CompareFormatter()  # Traced sub-module
        self.comparer = LLMInference(
            alias="smart",
            system_prompt=self.comparison_style,
        )

    def forward(self, doc1: str, doc2: str) -> str:
        # These two calls are INDEPENDENT - plait runs them in PARALLEL
        facts1 = self.extractor(doc1)
        facts2 = self.extractor(doc2)

        # Formatter is a sub-module - participates in tracing
        prompt = self.formatter(facts1, facts2)
        return self.comparer(prompt)


# Configure endpoints separately from module definition
resources = ResourceConfig(
    endpoints={
        "fast": OpenAIEndpointConfig(model="gpt-4o-mini", max_concurrent=20),
        "smart": OpenAIEndpointConfig(model="gpt-4o", max_concurrent=5),
    }
)

# Bind resources and execute
pipeline = ExtractAndCompare().bind(resources=resources)
result = await pipeline(doc1, doc2)

Learnable Parameters

Prompts stored as Parameter can be optimized through backward passes with feedback.

Automatic Parallelism

Independent calls to extractor are traced and executed concurrently - no async boilerplate needed.

Multi-Model Configuration

Use aliases like "fast" and "smart" to decouple module logic from endpoint configuration.

Resource Binding

Same pipeline can run against different endpoints - dev, prod, or self-hosted models.

Execution DAG

The tracer captures the dependency graph from your forward() method. Independent branches execute in parallel automatically. Sub-modules like formatter participate in tracing.

Input

extractor(doc1)

extractor(doc2)

formatter

comparer

Output

Runs in parallel

Get Started in Seconds

uv add pyplait

pip install pyplait

Published as pyplait on PyPI, import as plait

Requires Python 3.13+

Read the Docs Compare Frameworks View on GitHub

plait PyTorch for LLM Pipelines

If you know PyTorch, you know plait

Why plait?

Automatic Parallelism

Automatic DAG Capture

Async-First Execution

Runtime Optimization

Multi-Model Pipelines

Learnable Parameters

Framework Comparison

Benchmark: Extract-and-Compare Pipeline

Execution Time

Memory Usage

Parallelism: Write Sequential, Execute Parallel

See It In Action

Learnable Parameters

Automatic Parallelism

Multi-Model Configuration

Resource Binding

Execution DAG

Get Started in Seconds