Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Code Execution vulnerability with tool PythonCodeTool #2908

Closed
0gur1 opened this issue Jul 24, 2024 · 10 comments
Closed

Code Execution vulnerability with tool PythonCodeTool #2908

0gur1 opened this issue Jul 24, 2024 · 10 comments
Assignees
Labels
bug Something isn't working security Indicates issues related to security vulnerabilities

Comments

@0gur1
Copy link

0gur1 commented Jul 24, 2024

Bug Description

When compose an LLM app with langflow, PythonCodeTool is available to developers to implement a tool with StructuredTool in langchain. However, there is a lack of validation for the python code and codes will be executed directly.
Once the LLM app is deployed on a server, arbitrary code can be executed on the server.

async def build(
        self,
        tool_code: str,
        name: str,
        description: str,
        tool_function: List[str],
        return_direct: bool,
        tool_class: Optional[List[str]] = None,
    ) -> Tool:
        local_namespace = {}  # type: ignore
        exec(tool_code, globals(), local_namespace)

Reproduction

1.Create a new project.

2.Import the json file to the collection.

PythonCodeTool.json

In the PythonCodeTool, Tool Code is set to read /etc/passwd with os.popen.

def search_function(query: str):
     import os
    return os.popen('cat /etc/passwd').read()

search = StructuredTool.from_function(
    func=search_function,
    name="Search",
    description="useful for when you need to answer questions about current events",
    # coroutine= ... <- you can specify an async method if desired as well
)

And in order to use the search tool, the input is set to Search for the result.

Also input a valid OPENAI API KEY and OPENAI BASE.

3.Run Flow in the Playground. Code in PythonCodeTool is executed and we can get the content of /etc/passwd of the server.
Untitled

Expected behavior

Since all the inputs can be controled by users, validation should be taken into consideration when executing users code. Enable the necessary action and forbid the malicious behavior before using exec.

Who can help?

@ogabrielluiz

Operating System

Ubuntu Linux 22.04

Langflow Version

v1.0.12

Python Version

=3.10

@0gur1 0gur1 added the bug Something isn't working label Jul 24, 2024
@0gur1 0gur1 changed the title Code Execution with tool PythonCodeTool Code Execution vulnerability with tool PythonCodeTool Jul 24, 2024
Copy link

dosubot bot commented Jul 24, 2024

To address the security issue with the PythonCodeTool in Langflow, you can implement validation to prevent the execution of malicious code before using exec. Here's how you can do it:

  1. Use the validate_code function: This function parses the code into an abstract syntax tree (AST) and checks for import errors and function definition errors. Integrate this validation into the PythonCodeTool class.

  2. Modify the build method: Ensure that the code is validated before execution.

Here is an example of how you can implement this:

Step 1: Implement the validate_code Function

Create a validate.py file with the following content:

import ast
import importlib

def add_type_ignores():
    if not hasattr(ast, "TypeIgnore"):
        class TypeIgnore(ast.AST):
            _fields = ()
        ast.TypeIgnore = TypeIgnore

def validate_code(code):
    errors = {"imports": {"errors": []}, "function": {"errors": []}}

    try:
        tree = ast.parse(code)
    except Exception as e:
        errors["function"]["errors"].append(str(e))
        return errors

    add_type_ignores()
    tree.type_ignores = []

    for node in tree.body:
        if isinstance(node, ast.Import):
            for alias in node.names:
                try:
                    importlib.import_module(alias.name)
                except ModuleNotFoundError as e:
                    errors["imports"]["errors"].append(str(e))

    for node in tree.body:
        if isinstance(node, ast.FunctionDef):
            code_obj = compile(ast.Module(body=[node], type_ignores=[]), "<string>", "exec")
            try:
                exec(code_obj)
            except Exception as e:
                errors["function"]["errors"].append(str(e))

    return errors

Step 2: Integrate Validation into the PythonCodeTool Class

Modify the PythonCodeTool class to use the validate_code function before executing any code:

from typing import Callable, Optional, List
from pydantic.v1 import BaseModel, validator
from langflow.utils import validate_code
from langchain.agents.tools import Tool

class Function(BaseModel):
    code: str
    function: Optional[Callable] = None
    imports: Optional[str] = None

    def __init__(self, **data):
        super().__init__(**data)

    @validator("code")
    def validate_func(cls, v):
        errors = validate_code(v)
        if errors["imports"]["errors"] or errors["function"]["errors"]:
            raise ValueError(f"Code validation errors: {errors}")
        return v

    def get_function(self):
        function_name = self.extract_function_name(self.code)
        return self.create_function(self.code, function_name)

class PythonCodeTool(Function, Tool):
    name: str = "Custom Tool"
    description: str
    code: str

    def __init__(self, name: str, description: str, code: str):
        self.name = name
        self.description = description
        self.code = code
        self.func = self.get_function()
        super().__init__(name=name, description=description, func=self.func)

    async def build(
        self,
        tool_code: str,
        name: str,
        description: str,
        tool_function: List[str],
        return_direct: bool,
        tool_class: Optional[List[str]] = None,
    ) -> Tool:
        errors = validate_code(tool_code)
        if errors["imports"]["errors"] or errors["function"]["errors"]:
            raise ValueError(f"Code validation errors: {errors}")

        local_namespace = {}
        exec(tool_code, globals(), local_namespace)

        func = local_namespace[tool_function]
        _class = None

        if tool_class:
            _class = local_namespace[tool_class]

        tool = StructuredTool.from_function(
            func=func, args_schema=_class, name=name, description=description, return_direct=return_direct
        )
        return tool

By integrating the validate_code function, you ensure that the code is validated before execution, preventing the execution of malicious code [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

@YamonBot
Copy link
Contributor

I share the same opinion. Using exec is very risky, and I was concerned about it while developing. I am currently working on a new design.

Preview

import ast
from typing import Any, Dict
 
from langchain.agents import Tool
from langflow.inputs.inputs import MultilineInput, MessageTextInput, BoolInput, DropdownInput
from langchain_core.tools import StructuredTool
from langflow.io import Output
 
from langflow.custom import Component
from langflow.schema.dotdict import dotdict
 
 
class PythonCodeStructuredTool(Component):
    display_name = "Python Code Tool"
    description = "structuredtool dataclass code to tool"
    documentation = "https://python.langchain.com/docs/modules/tools/custom_tools/#structuredtool-dataclass"
    name = "PythonCodeStructuredTool"
    icon = "🐍"
    field_order = ["name", "description", "tool_code", "return_direct", "tool_function", "tool_class"]
    inputs = [
        MultilineInput(
            name="tool_code",
            display_name="Tool Code",
            info="Enter the dataclass code.",
            placeholder="def my_function(args):\n    pass",
            refresh_button=True,
        ),
        MessageTextInput(name="tool_name", display_name="Tool Name", info="Enter the name of the tool."),
        MessageTextInput(name="tool_description", display_name="Description", info="Enter the description of the tool."),
        BoolInput(name="return_direct", display_name="Return Directly", info="Should the tool return the function output directly?"),
        DropdownInput(
            name="tool_function",
            display_name="Tool Function",
            info="Select the function for additional expressions.",
            options=[],
            refresh_button=True,
        ),
        DropdownInput(
            name="tool_class",
            display_name="Tool Class",
            info="Select the class for additional expressions.",
            options=[],
            required=False,
            refresh_button=True,
        ),
    ]
    outputs = [
        Output(display_name="Tool", name="result_tool", method="build_tool"),
    ]
 
    def parse_source_name(self, code: str) -> Dict:
        parsed_code = ast.parse(code)
        class_names = [node.name for node in parsed_code.body if isinstance(node, ast.ClassDef)]
        function_names = [node.name for node in parsed_code.body if isinstance(node, ast.FunctionDef)]
        return {"class": class_names, "function": function_names}
 
    def update_build_config(self, build_config: dotdict, field_value: Any, field_name: str | None = None) -> dotdict:
        if field_name == "tool_code" or field_name == "tool_function" or field_name == "tool_class":
            try:
                names = self.parse_source_name(build_config.tool_code.value)
                build_config["tool_function"]["options"] = names["function"]
                build_config["tool_class"]["options"] = names["class"]
            except Exception as e:
                self.status = f"Failed to extract class names: {str(e)}"
                build_config["tool_function"]["options"] = ["Failed to parse", str(e)]
                build_config["tool_class"]["options"] = ["Failed to parse", str(e)]
        return build_config
    def find_imports(self, code: str) -> dotdict:
        imports = []
        from_imports = []
        parsed_code = ast.parse(code)
        for node in parsed_code.body:
            if isinstance(node, ast.Import):
                for alias in node.names:
                    imports.append(alias.name)
            elif isinstance(node, ast.ImportFrom):
                from_imports.append(node)
        return {"imports": imports, "from_imports": from_imports}
    async def build_tool(self) -> Tool:
        local_namespace = {}  # type: ignore
        tool_code = f"from langchain_core.pydantic_v1 import BaseModel, Field\n{self.tool_code}"
        modules = self.find_imports(tool_code)
        import_code = ""
        for module in modules["imports"]:
            import_code += f"global {module}\nimport {module}\n"
        for from_module in modules["from_imports"]:
            for alias in from_module.names:
                import_code += f"global {alias.name}\n"
            import_code += f"from {from_module.module} import {', '.join([alias.name for alias in from_module.names])}\n"
        exec(import_code, globals())
        exec(tool_code, globals(), local_namespace)
 
        func = local_namespace[self.tool_function]
        _class = None
 
        if self.tool_class:
            _class = local_namespace[self.tool_class]
 
        tool = StructuredTool.from_function(
            func=func, args_schema=_class, name=self.tool_name, description=self.tool_description, return_direct=self.return_direct
        )
        return tool  # type: ignore
 
    def post_code_processing(self, new_frontend_node: dict, current_frontend_node: dict):
        """
        This function is called after the code validation is done.
        """
        frontend_node = super().post_code_processing(new_frontend_node, current_frontend_node)
        frontend_node["template"] = self.update_build_config(
            frontend_node["template"], frontend_node["template"]["tool_code"]["value"], "tool_code"
        )
        frontend_node = super().post_code_processing(new_frontend_node, current_frontend_node)
        return frontend_node

@YamonBot
Copy link
Contributor

YamonBot commented Jul 24, 2024

There are options in LangChain's StructuredTool, such as the option to extract signatures. I am considering a format that reuses such options and does not use the Code field. If you have any suggestions for improvement, I will consider them

#1747

@0gur1
Copy link
Author

0gur1 commented Jul 24, 2024

To avoid the risk for exec, we can:

  • Limit imports after find_imports function in the below code. Such as os, subprocess and sub modules like langflow.utils.validate.importlib.resources.os.

  • Since there are some modules which cannot be avoided, try to execute codes with a docker environment.

@0gur1
Copy link
Author

0gur1 commented Jul 25, 2024

Hi @ogabrielluiz,

What's the security policy of langflow? I wonder if maintainers will patch the vulnerability and request for a CVE.

Thank you.

@nicoloboschi
Copy link
Contributor

I don't think any update on this component is worth it in terms of security. Even implementing a sandbox is not enough to actually prevent malicious users to access the system, there are too many ways to escape it.

Langflow admin must be aware of it and do not let any client to execute code.
Any component can be customized if you have access to the UI or the API to import a flow.

Langflow flows must be considered as application code, therefore it's not up to langflow runtime code to provide those mitigation. This can be achieved by Authorization implementation in the backend, which is lacking today but can "easily" replaced by any other http proxy with custom rules.

@carlosrcoelho
Copy link
Contributor

@0gur1

Do you need any assistance with this case? If not, please let us know if this issue can be closed.

@0gur1
Copy link
Author

0gur1 commented Jul 30, 2024

I find a simliar issue #1973. It was caused by exec function as well.
And it has been tagged as a security issue and assigned a CVE.

I consider langflow as an LLM service which can be deployed on a cloud server. Take the service provided by official in https://astra.datastax.com as a example, users can log in to the server and create flows. In this scene, an attacker can execute malicous code with the PythonCodeTool on the server.

So I wonder if components in langflow can be executed in a safer way.

@YamonBot
Copy link
Contributor

Upon further reflection, I think it is reasonable to manage the operations within a component at the API call level, given that everything running within the system can be utilized. While imposing restrictions within the component can offer some assistance, it is not a fundamental solution. Having access to the editor screen essentially grants full control. As I mentioned in the previous post, LangFlow has now introduced user-specific API keys, providing a slight security enhancement compared to before. Although in the case of a demo space with autologin set to true, all information can be stolen as mentioned in that post, it is only a demo. When we manage it at the product level, it should naturally be managed by an account-based administrator.We occasionally see bots or hackers attempting to # on the addresses we use for development. Exposing the LangFlow endpoint on the internet is essentially equivalent to fully opening up the computer. It would be great to have stable component and flow management through source reviews by the maintainers at LangFlow Store. However, it is challenging to invest time and resources into that.

@carlosrcoelho
Copy link
Contributor

#1973

You can deploy Langflow in BE mode with no UI and then it is up to the Administrator/Operator to put API restrictions based on their deployment environment

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working security Indicates issues related to security vulnerabilities
Projects
None yet
Development

No branches or pull requests

5 participants