Build, execute, and optimize compound AI systems with the familiar PyTorch programming model. Write normal Python, get automatic parallelism, and improve prompts through backward passes.
nn.Module
Base class
forward()
Define computation
backward()
Propagate gradients
nn.Parameter
Learnable weights
torch.fx.Tracer
Capture graph
Module
Base class
forward()
Define computation
backward()
Propagate feedback
Parameter
Learnable prompts
Tracer
Capture graph
Most LLM applications are systems, not single calls. plait moves the complexity into a shared runtime.
Independent operations run concurrently without boilerplate. No manual asyncio.gather() required.
Write normal Python code in forward(). Dependencies are discovered automatically via tracing.
Maximum throughput with adaptive backpressure, rate limiting, and automatic retries.
Backward passes propagate feedback to improve prompts. Optimize your pipeline during execution.
Use different models for different steps. Alias-based configuration separates logic from deployment.
Prompts and instructions as Parameter objects that can be optimized via feedback.
See how plait compares to other LLM frameworks on key features.
| Feature | plait | Pydantic AI | LangGraph | DSPy |
|---|---|---|---|---|
| Graph definition | Implicit (tracing) | Explicit | Explicit | Implicit |
| Parallel execution | Automatic | Manual | Explicit | Sequential |
| Multi-model | Alias-based | Per-agent | Per-node | Global config |
| Learnable params | Parameter class | No | No | Compile-time |
| Optimization | Runtime backward | No | No | Compile-time |
| Execution model | Async-first | Async | Async | Sync-first |
| PyTorch-like API | No | No | No |
Real-world performance on a fan-out workflow: 2 parallel extractions (gpt-4o-mini) + 1 comparison (gpt-4o).
plait is up to 2x faster and uses up to 99% less memory than alternatives.
class ExtractAndCompare(Module):
def forward(self, doc1: str, doc2: str) -> str:
# These two calls are INDEPENDENT - plait runs them in PARALLEL
facts1 = self.extractor(doc1)
facts2 = self.extractor(doc2)
# Formatter is a traced sub-module
prompt = self.formatter(facts1, facts2)
return self.comparer(prompt)
# That's it! No async boilerplate, no explicit parallelism.
# The tracer discovers independence and parallelizes automatically.
async def extract_and_compare(doc1: str, doc2: str) -> str:
# Must explicitly use asyncio.gather for parallelism
facts1, facts2 = await asyncio.gather(
extractor.run(doc1),
extractor.run(doc2),
)
# Manual string formatting
prompt = f"Compare:\n{facts1.data}\n\nvs:\n{facts2.data}"
result = await comparer.run(prompt)
return result.data
# Requires understanding of async/await patterns.
# Parallelism must be manually identified and implemented.
def fan_out(state: State) -> list[Send]:
# Must define explicit fan-out function
return [
Send("extract", {"doc": state["doc1"], "key": "facts1"}),
Send("extract", {"doc": state["doc2"], "key": "facts2"}),
]
def reduce(results: list[str]) -> str:
# Must define reducer to collect parallel results
return "\n\n".join(results)
graph.add_conditional_edges(START, fan_out)
graph.add_node("extract", extract_node)
graph.add_node("compare", compare_node)
# Requires explicit graph construction with Send() and reducers.
# More boilerplate for simple parallel patterns.
class ExtractAndCompare(dspy.Module):
def forward(self, doc1: str, doc2: str) -> str:
# Runs sequentially - no built-in parallelism
facts1 = self.extract(document=doc1).facts # Runs first
facts2 = self.extract(document=doc2).facts # Then second
# Compare after both complete
result = self.compare(facts1=facts1, facts2=facts2)
return result.comparison
# DSPy is sync-first. These calls execute one after another.
# No automatic parallelism for independent operations.
A complete pipeline showing automatic parallelism, multi-model configuration, and learnable parameters.
from plait import Module, LLMInference, Parameter
from plait.resources import OpenAIEndpointConfig, ResourceConfig
class CompareFormatter(Module):
"""Format two fact lists for comparison (traced sub-module)."""
def forward(self, facts1: str, facts2: str) -> str:
return f"Compare:\n{facts1}\n\nvs:\n{facts2}"
class ExtractAndCompare(Module):
"""Extract facts from two documents and compare them."""
def __init__(self):
super().__init__()
# Learnable parameter - can be optimized via backward pass
self.comparison_style = Parameter(
value="Highlight key similarities and differences.",
description="Controls the style of comparison output.",
)
self.extractor = LLMInference(
alias="fast",
system_prompt="Extract the main facts as a bulleted list.",
)
self.formatter = CompareFormatter() # Traced sub-module
self.comparer = LLMInference(
alias="smart",
system_prompt=self.comparison_style,
)
def forward(self, doc1: str, doc2: str) -> str:
# These two calls are INDEPENDENT - plait runs them in PARALLEL
facts1 = self.extractor(doc1)
facts2 = self.extractor(doc2)
# Formatter is a sub-module - participates in tracing
prompt = self.formatter(facts1, facts2)
return self.comparer(prompt)
# Configure endpoints separately from module definition
resources = ResourceConfig(
endpoints={
"fast": OpenAIEndpointConfig(model="gpt-4o-mini", max_concurrent=20),
"smart": OpenAIEndpointConfig(model="gpt-4o", max_concurrent=5),
}
)
# Bind resources and execute
pipeline = ExtractAndCompare().bind(resources=resources)
result = await pipeline(doc1, doc2)
Prompts stored as Parameter can be optimized through backward passes with feedback.
Independent calls to extractor are traced and executed concurrently - no async boilerplate needed.
Use aliases like "fast" and "smart" to decouple module logic from endpoint configuration.
Same pipeline can run against different endpoints - dev, prod, or self-hosted models.
The tracer captures the dependency graph from your forward() method.
Independent branches execute in parallel automatically. Sub-modules like formatter participate in tracing.
uv add pyplait
Published as pyplait on PyPI, import as plait
Requires Python 3.13+