Technical Advisory: LangChain Serialization Injection Enables Secret Extraction

Giuseppe Trovato
Giuseppe Trovato
Head of Research

Authenticated attackers can escape n8n's Pyodide Python sandbox to execute arbitrary system commands with host privileges.

Disclosed: December 23, 2025

Severity: Critical (CVSS 9.3 - CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:L/A:N)

Package/Component: langchain-core

Advisories: GHSA-c67j-w6g6-q2cmCVE-2025-68664

Exploitation Status: No public exploits or active exploitation reported as of disclosure date

Executive Summary

What Happened: A critical serialization injection vulnerability was discovered in LangChain’s core serialization functions (dumps() and dumpd()), allowing attackers to extract environment variables, secrets, and instantiate arbitrary classes when untrusted data is deserialized.

Why It Matters:

  • Supply Chain Risk: LangChain is a foundational framework for AI applications, making this vulnerability impactful across numerous production AI systems and agent deployments
  • Secret Exposure: Default configurations automatically load environment variables during deserialization, exposing API keys, database credentials, and other sensitive secrets without any authentication
  • Scope Change Impact: The vulnerability can impact resources beyond the vulnerable component, creating cross-system compromise risks particularly severe in multi-tenant or containerized AI deployments
  • AI Agent Exploitation: Attackers can leverage prompt injection against LLMs to manipulate serialized message histories, enabling indirect exploitation through seemingly benign user interactions—this is the primary attack vector
  • Stealth Persistence: The vulnerability affects streaming operations and message history mechanisms, creating opportunities for repeated exploitation across user sessions

High-Level Risks:

  • Credential theft through environment variable extraction
  • Data exfiltration via arbitrary class instantiation
  • Lateral movement using extracted cloud provider credentials
  • Supply chain compromise through dependency exploitation

Immediate Actions:

  1. Upgrade to langchain-core 1.2.5+ (v1.x branch) or 0.3.81+ (v0.3.x branch) - Note: Breaking changes in patched versions
  2. Rotate all API keys and secrets accessible to LangChain applications
  3. Review audit logs for unusual deserialization patterns or unexpected network activity
  4. If immediate upgrade not possible, set secrets_from_env=False and use allowed_objects parameter in all load()/loads() calls

Overview

The vulnerability stems from insufficient input escaping in LangChain’s serialization functions. When dictionaries containing the special 'lc' key—LangChain’s internal marker for serialized objects—are included in user-controlled data, the deserialization process treats injected structures as legitimate LangChain objects rather than plain data. This allows attackers to craft malicious payloads that extract secrets from environment variables or instantiate arbitrary classes within trusted namespaces during deserialization.

Risk Analysis

This threat is particularly concerning in the context of agentic AI systems due to:

  1. Prompt Injection Attack Surface: LLM responses containing manipulated additional_kwargs fields can be weaponized through prompt injection, allowing indirect exploitation where the attacker never directly controls the serialization input—the LLM does it for them.
  2. Persistent Exploitation in Conversational Agents: Applications using RunnableWithMessageHistory or streaming operations (astream_events, astream_log) continuously serialize and deserialize message histories, creating repeated opportunities for exploitation across multiple user interactions.
  3. Transitive Trust Violations: Vector stores, caches, and external document sources become attack vectors when deserialized content originates from untrusted third-party systems, violating the assumption that persisted data is safe to reload.

Exploitation Prerequisites

For successful exploitation, the following conditions must be met:

  1. Serialization of User-Influenced Data: Application must call dumps() or dumpd() on data containing or influenced by user input
  2. Deserialization Cycle: Application must later call load() or loads() on that serialized data
  3. Vulnerable Configuration (pre-patch): secrets_from_env=True (was the default before patches)
  4. Accessible Secrets: Environment variables must contain exploitable secrets (API keys, credentials, tokens)

Note: Not all LangChain usage is vulnerable; only applications with this specific serialization/deserialization flow are at risk.

Technical Details

Primary Attack Vector: Prompt Injection via LLM Response Fields

The most common exploitation path leverages LLM response manipulation:

  1. Attacker crafts malicious prompts instructing the LLM to include specific JSON structures in response metadata fields (additional_kwargs, response_metadata)
  2. Application serializes LLM responses for streaming operations (astream_events(version="v1"), astream_log()) or message history storage
  3. Deserialization executes the injected payload when the application loads the message history or processes streamed events
  4. Secrets are extracted and appear in application responses, logs, or are exfiltrated through side channels

This is particularly insidious because the attacker never directly controls serialization input; the LLM does it for them through prompt injection.

Secondary Attack Vectors:

1. Direct Serialization Injection:

  • Attacker provides input containing {"lc": 1, "type": "secret", "id": ["ENV_VAR_NAME"]}
  • Application serializes this data using dumps() or dumpd() without escaping the 'lc' key
  • When load() or loads() deserializes the data with secrets_from_env=True (previous default), the secret structure is interpreted as a LangChain secret object
  • The deserialization process automatically loads the environment variable value and returns it to the attacker

2. Vector Store Document Injection:

  • Attacker uploads malicious documents to vector stores (e.g., via RAG pipelines)
  • Documents contain serialized injection payloads in metadata fields
  • Application calls InMemoryVectorStore.load() or similar deserialization on untrusted documents
  • Deserialization triggers secret extraction or class instantiation

3. Arbitrary Class Instantiation:

  • Attacker injects serialized structures referencing classes within langchain_core, langchain, or langchain_community namespaces
  • Deserialization instantiates these classes with attacker-controlled constructor parameters
  • Class constructors execute side effects such as network requests, file operations, or template rendering (Jinja2)

Impact

Systems affected by this threat may experience:

  • Secret Exfiltration: Environment variables containing API keys (OpenAI, Anthropic, AWS, Azure), database credentials, and internal service tokens are exposed through deserialization
  • Privilege Escalation: Extracted cloud provider credentials enable lateral movement to other services and resources within the organization’s infrastructure
  • Remote Code Execution: Jinja2 template instantiation allows arbitrary Python code execution when templates are controlled by attackers
  • Data Integrity Compromise: Arbitrary class instantiation can modify application state, corrupt databases, or manipulate vector store embeddings
  • Denial of Service: Constructor side effects can trigger resource-intensive operations, network floods, or infinite loops
  • Compliance Violations: Secret exposure may constitute a breach under GDPR, CCPA, SOC 2, ISO 27001, and other regulatory frameworks

Affected Versions

The following versions are known to be affected:

  • langchain-core versions 1.0.0 ≤ version < 1.2.5
  • langchain-core versions < 0.3.81

Patched Versions:

  • langchain-core >= 1.2.5 (for v1.x users)
  • langchain-core >= 0.3.81 (for v0.3.x users)

Check Your Version:

pip show langchain-core

Immediate Mitigation Steps

  1. Upgrade to Patched Versions - Breaking Changes Alert:
    • # For v1.0.0+ users
      pip install --upgrade 'langchain-core>=1.2.5'

      # For v0.3.x users
      pip install --upgrade 'langchain-core>=0.3.81'


      Important Breaking Changes in Patched Versions:
      • secrets_from_env parameter now defaults to False (was True)
      • New allowed_objects parameter defaults to 'core' (restricts to langchain_core only)
      • New init_validator parameter blocks Jinja2 template execution by default
      • Applications relying on automatic secret loading or custom classes will need code updates
  2. Rotate Compromised Secrets:
    • Immediately rotate all API keys, database credentials, and service tokens accessible to LangChain applications
    • Review application logs for evidence of secret extraction (unusual deserialization errors, unexpected network connections)
    • Check cloud provider audit logs for unauthorised access using potentially extracted credentials
    • Monitor for anomalous API usage patterns indicating compromised keys
  3. Implement Detection and Monitoring:
    • Add application logging around load() and loads() calls to capture deserialization sources
    • Monitor for deserialization of objects with 'lc' keys in user-controlled data
    • Alert on environment variable access patterns during deserialization operations
    • Implement rate limiting on streaming endpoints to prevent rapid exploitation attempts
  4. Apply Secure Configuration Defaults:
    • Explicitly set secrets_from_env=False when calling load() or loads() on untrusted data
    • Use the allowed_objects parameter to restrict deserialization to known-safe classes
    • Set init_validator to block Jinja2 template execution unless absolutely necessary
    • Validate and sanitise all user-controlled data before serialization
  5. Restrict Deserialization Scope:
    • Avoid deserializing data from untrusted sources entirely
    • Use allowlists for accepted metadata fields in LLM responses
    • Implement separate serialization paths for trusted vs untrusted data
  6. Network-Level Controls:
    • Monitor and restrict outbound connections from applications using LangChain
    • Block environment variable access at the OS level for the application user
    • Use containerization with minimal environment variable exposure

Long-term Recommendations

  1. Architectural Security Controls:
    • Implement secret management using dedicated vaults instead of environment variables
    • Separate secrets by least-privilege boundaries; avoid single applications having access to all organizational credentials
    • Use short-lived tokens and credential rotation policies to minimize exposure windows
    • Implement application-level access controls on serialization/deserialization operations
  2. Input Validation and Sanitization:
    • Establish allowlists for metadata fields accepted from LLM responses and external sources
    • Strip or escape 'lc' keys from all user-controlled dictionaries before serialization
    • Validate JSON schemas for message histories and streaming event payloads
    • Implement content security policies for prompt inputs to detect injection patterns
    • Use structured outputs from LLMs with strict type validation rather than free-form JSON
  3. Agent Security Governance:
    • Conduct security reviews of all LangChain agent implementations focusing on serialization flows
    • Establish incident response procedures specific to AI agent compromise scenarios
    • Train development teams on secure serialization practices and prompt injection defenses
    • Maintain an inventory of all dependencies using LangChain to enable rapid patching
    • Implement security testing for prompt injection vulnerabilities in CI/CD pipelines

Indicators of Compromise

Organizations should search logs for the following patterns:

Application Logs:

  • Deserialization errors mentioning 'lc' keys in unexpected contexts
  • Log entries showing environment variable access during load() or loads() operations
  • Warnings about unknown or unexpected class types during deserialization
  • JSON structures in user inputs containing {"lc": 1, "type": "secret", ...}

Network Logs:

  • Unexpected outbound connections initiated during deserialization operations
  • Connections to external services not part of normal application behavior
  • Data exfiltration to attacker-controlled endpoints following LLM interactions

Framework Context

This incident aligns with multiple security frameworks:

OWASP Agentic Security & Integrity (ASI) 2026:

  • ASI01:2026 Agent Goal Hijack - Prompt injection manipulates agent objectives through serialized message histories, redirecting autonomous behavior toward attacker-controlled goals
  • ASI02:2026 Tool Misuse and Exploitation - Agents misuse legitimate tools when deserialization triggers unintended actions via prompt-injected message histories
  • ASI03:2026 Identity and Privilege Abuse - Extracted credentials enable privilege escalation across agent delegation chains and interconnected systems
  • ASI04:2026 Agentic Supply Chain Vulnerabilities - LangChain as a foundational dependency creates cascading risk across agent ecosystems and runtime tool composition
  • ASI05:2026 Unexpected Code Execution (RCE) - Unsafe deserialization converts serialized structures into executable code through Jinja2 templates and arbitrary class instantiation
  • ASI06:2026 Memory & Context Poisoning - Attackers corrupt agent memory stores and vector databases with serialized injection payloads that persist across sessions

OWASP Agentic AI Threats & Mitigations (AATM) v1.0:

  • T1 Memory Poisoning - Malicious serialized data injected into agent memory stores influences subsequent autonomous reasoning and tool selection
  • T2 Tool Misuse - Dynamic tool integration exploited through deserialisation-triggered actions beyond intended authorization scope
  • T3 Privilege Compromise - Credential extraction enables escalation across agent delegation chains and cross-system trust relationships
  • T6 Intent Breaking - Prompt injection manipulates goal inference through poisoned message histories, hijacking agent objectives mid-execution
  • T11 Unexpected RCE - Unsafe serialization converts attacker-controlled text into executable code via template engines and dynamic class instantiation
  • T12 Communication Poisoning - Serialized injection payloads corrupt inter-agent message exchanges in distributed agentic workflows

OWASP LLM Top 10 (2025):

  • LLM01:2025 Prompt Injection - Primary attack vector leveraging prompt manipulation to inject malicious structures into serialized LLM response metadata
  • LLM03:2025 Supply Chain - LangChain dependency vulnerability creates systemic exposure across AI application ecosystems
  • LLM06:2025 Excessive Agency - Autonomous agents with serialization access can execute privileged operations when credentials are extracted from environment

Common Weakness Enumeration (CWE):

  • CWE-502 Deserialization of Untrusted Data - Insecure deserialization allowing attacker control over object instantiation and execution flow through crafted serialized payloads
  • CWE-94 Improper Control of Generation of Code - Arbitrary class instantiation with attacker-controlled parameters enabling dynamic code generation via template engines

Updates

We will update this analysis as more information becomes available. Please monitor our security channels for the latest updates.

Footer graphic with abstract geometric patterns and gradients