-
Notifications
You must be signed in to change notification settings - Fork 16.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Replace exec with wasm_exec #5640
Conversation
Just my opinions here. I've not commit bit & am relatively new to langchain.
100% needed, and thanks!
It's better to get this in place first, and then worry about 3rd party deps later. The security concern massively outweighs feature depth.
I also think it's not worth a lot of effort to support power Python versions. (b) & (c) feel like maintenance burdens, and simplicity is important in security-minded components. |
@hwchase17 @vowelparrot would love some feedback on this PR. I recognize some of the tests are still failing, but I think that resolution is dependent upon answers to the questions I posed above. Thanks! |
@hwchase17 @vowelparrot bumping again, hoping to get your thoughts... |
pyproject.toml
Outdated
@@ -102,6 +102,7 @@ azure-cognitiveservices-speech = {version = "^1.28.0", optional = true} | |||
py-trello = {version = "^0.19.0", optional = true} | |||
momento = {version = "^1.5.0", optional = true} | |||
bibtexparser = {version = "^1.4.0", optional = true} | |||
wasm-exec = "^0.1.7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be required dependeyncy, lets make optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems dependent on the conversation below. Given that this is a big reason for the CVE that are flagging Langchain in different security scans, I'd advocate (for now) to make it a required dependency.
langchain/utilities/python.py
Outdated
arbitrary_types_allowed = True | ||
|
||
|
||
def _get_wasm_executor() -> WasmExecutor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would probably bias towards a separate util tbh. since this introduces a new dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also goes back to the fundamental question of whether this should be an add-in or replace the default exec functionality. If it's the default replacement, then seems like leveraging the python util makes sense.
langchain/tools/python/tool.py
Outdated
@@ -18,7 +18,7 @@ | |||
|
|||
|
|||
def _get_default_python_repl() -> PythonREPL: | |||
return PythonREPL(_globals=globals(), _locals=None) | |||
return PythonREPL(_globals=None, _locals=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR is to offer better security in use of Python REPL, and passing in globals()
looks like a massive security hole to me. I see a lot of code out there where people expose e.g. their API keys and other secrets as globals in memory space, and those could be a prompt injection target. The rest of your comments, @hwchase17 seem to be towards making a higher security level optional, and I'd reluctantly accept that (for me, higher security should be the default), but if the user does opt for a tighter ship, then that _globals=None
is IMO an important part of the picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed with @uogbuji but also right now the Wasm interpreter doesn't supported globals that can't be represented as a type that can be easily converted to string. Often the result of the globals function call are object types that can't be converted to string.
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
@hwchase17 can you review the responses to the comments and let me know how you want to proceed? |
any update on this feature/PR? Thanks. |
sorry for the delay here. high level thoughts:
I would suggest the following changes:
thoughts? |
Responded more in #8092 As far as this goes, I'll probably take these changes and implement as a separate package. How would that jive with the story of |
@Jflick58 Hi , could you, please, resolve the merging issues? After that ping me and I push this PR for the review. Thanks! |
@leo-gan I thought @hwchase17 decided to move the repl functionality to langchain-experimental to avoid including this security issue in the main langchain install? Happy to work on this PR to include the |
@Jflick58 If this change suits experimental better, is it possible to start this change again as a new PR in experimental? If you want to do this, please, close this PR. |
Would it have to be a new PR? Given that |
It is up to you. This PR could be OK 👍 |
Hey guys, any updates for this PR? |
@RuanAzevedo This is still on my to-do list. I'm traveling for work this week but hoping to pick it back up next week. |
@@ -66,6 +78,6 @@ def run(self, command: str, timeout: Optional[int] = None) -> str: | |||
p.terminate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the timeout branch also needs to be updated to pass self.wasm
as arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we pass self.wasm as an arg if it has access to the wasm class attribute?
Is there any update on this? |
closing as the CVE has been addressed through other means. i am supported of a wasm based exec tool in the future |
This replaces the usage of the insecure
exec
function with a more secure library called wasm_exec TL;DR: Usechroot
jails,wasmtime
and a standalone Python3.11 WASM interpreter to safely execute arbitrary codeSome questions before I officially submit this pr:
How critical is it that we support the execution of code containing 3rd-party dependencies? The team behind the wasm interpreter I vendor with the library is working on porting
numpy
along with some of the c libs needed to enable broader package support.If it is critical, then how critical is it to support Python versions below 3.11? Right now the interpreter does support the use of an existing venv for accessing installed 3rd party deps, but the interpreter is only distributed in 3.11 which means I would either a) have to only support 3.11 b) have a 3rd split in the PythonRepl code base (base, AST, Wasm) or b) try to write some hacky and unpythonic code to try to dynamically create a new 3.11-based venv off of the currently active Python environment.
If it is not critical then is it okay if I refactor the test cases to avoid the tests containing
numpy
andpandas
code? Should we also update the docs to reflect that limitation?I also strongly welcome feedback on the wasm_exec library if there are implementation ideas that make it more functional.
Fixes #1026
Fixes #5294
Fixes #5388
Fixes https://nvd.nist.gov/vuln/detail/CVE-2023-29374
Who can review?