Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Replace exec with wasm_exec #5640

Closed
wants to merge 20 commits into from
Closed

Conversation

Jflick58
Copy link
Contributor

@Jflick58 Jflick58 commented Jun 2, 2023

This replaces the usage of the insecure exec function with a more secure library called wasm_exec TL;DR: Use chroot jails, wasmtime and a standalone Python3.11 WASM interpreter to safely execute arbitrary code

Some questions before I officially submit this pr:

  • How critical is it that we support the execution of code containing 3rd-party dependencies? The team behind the wasm interpreter I vendor with the library is working on porting numpy along with some of the c libs needed to enable broader package support.

    • If it is critical, then how critical is it to support Python versions below 3.11? Right now the interpreter does support the use of an existing venv for accessing installed 3rd party deps, but the interpreter is only distributed in 3.11 which means I would either a) have to only support 3.11 b) have a 3rd split in the PythonRepl code base (base, AST, Wasm) or b) try to write some hacky and unpythonic code to try to dynamically create a new 3.11-based venv off of the currently active Python environment.

    • If it is not critical then is it okay if I refactor the test cases to avoid the tests containing numpy and pandas code? Should we also update the docs to reflect that limitation?

I also strongly welcome feedback on the wasm_exec library if there are implementation ideas that make it more functional.

Fixes #1026
Fixes #5294
Fixes #5388
Fixes https://nvd.nist.gov/vuln/detail/CVE-2023-29374

Who can review?

@uogbuji
Copy link

uogbuji commented Jun 3, 2023

Just my opinions here. I've not commit bit & am relatively new to langchain.

This replaces the usage of the insecure exec function with a more secure library called wasm_exec TL;DR: Use chroot jails, wasmtime and a standalone Python3.11 WASM interpreter to safely execute arbitrary code

100% needed, and thanks!

Some questions before I officially submit this pr:

  • How critical is it that we support the execution of code containing 3rd-party dependencies? The team behind the wasm interpreter I vendor with the library is working on porting numpy along with some of the c libs needed to enable broader package support.

It's better to get this in place first, and then worry about 3rd party deps later. The security concern massively outweighs feature depth.

  • If it is critical, then how critical is it to support Python versions below 3.11? Right now the interpreter does support the use of an existing venv for accessing installed 3rd party deps, but the interpreter is only distributed in 3.11 which means I would either a) have to only support 3.11 b) have a 3rd split in the PythonRepl code base (base, AST, Wasm) or b) try to write some hacky and unpythonic code to try to dynamically create a new 3.11-based venv off of the currently active Python environment.

I also think it's not worth a lot of effort to support power Python versions. (b) & (c) feel like maintenance burdens, and simplicity is important in security-minded components.

@Jflick58 Jflick58 changed the title WIP: Replace exec with wasm_exec Replace exec with wasm_exec Jun 3, 2023
@Jflick58
Copy link
Contributor Author

Jflick58 commented Jun 6, 2023

@hwchase17 @vowelparrot would love some feedback on this PR. I recognize some of the tests are still failing, but I think that resolution is dependent upon answers to the questions I posed above. Thanks!

@Jflick58
Copy link
Contributor Author

@hwchase17 @vowelparrot bumping again, hoping to get your thoughts...

pyproject.toml Outdated
@@ -102,6 +102,7 @@ azure-cognitiveservices-speech = {version = "^1.28.0", optional = true}
py-trello = {version = "^0.19.0", optional = true}
momento = {version = "^1.5.0", optional = true}
bibtexparser = {version = "^1.4.0", optional = true}
wasm-exec = "^0.1.7"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be required dependeyncy, lets make optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems dependent on the conversation below. Given that this is a big reason for the CVE that are flagging Langchain in different security scans, I'd advocate (for now) to make it a required dependency.

arbitrary_types_allowed = True


def _get_wasm_executor() -> WasmExecutor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would probably bias towards a separate util tbh. since this introduces a new dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also goes back to the fundamental question of whether this should be an add-in or replace the default exec functionality. If it's the default replacement, then seems like leveraging the python util makes sense.

@@ -18,7 +18,7 @@


def _get_default_python_repl() -> PythonREPL:
return PythonREPL(_globals=globals(), _locals=None)
return PythonREPL(_globals=None, _locals=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link

@uogbuji uogbuji Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to offer better security in use of Python REPL, and passing in globals() looks like a massive security hole to me. I see a lot of code out there where people expose e.g. their API keys and other secrets as globals in memory space, and those could be a prompt injection target. The rest of your comments, @hwchase17 seem to be towards making a higher security level optional, and I'd reluctantly accept that (for me, higher security should be the default), but if the user does opt for a tighter ship, then that _globals=None is IMO an important part of the picture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with @uogbuji but also right now the Wasm interpreter doesn't supported globals that can't be represented as a type that can be easily converted to string. Often the result of the globals function call are object types that can't be converted to string.

@hwchase17 hwchase17 added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jun 24, 2023
@vercel
Copy link

vercel bot commented Jul 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Nov 1, 2023 11:18pm

@Jflick58
Copy link
Contributor Author

Jflick58 commented Jul 3, 2023

@hwchase17 can you review the responses to the comments and let me know how you want to proceed?

@vercel vercel bot temporarily deployed to Preview July 3, 2023 06:52 Inactive
@timxieICN
Copy link

any update on this feature/PR? Thanks.

@hwchase17
Copy link
Contributor

sorry for the delay here. high level thoughts:

  1. We almost certainly don't want to introduce this as a dependency for the main project
  2. We want to support as flexible of python running for prototyping purposes as possible
  3. We do not want people to be running this in production

I would suggest the following changes:

  1. We will move the current python tools and anything that depends on them into langchain.experimental and make that a separate package. This should remove CVEs from core langchain, while also hopefullying making it clear its experimental, and also allowing for prototyping
  2. I would encourage the creation of a separate langchain_python or the like respository where that uses wasm by default. we'd be happy to support this in a variety of ways (help setting up, help publicizing, help maintaining), but i think we'd prefer it to be a separate repository

thoughts?

@Jflick58
Copy link
Contributor Author

Responded more in #8092

As far as this goes, I'll probably take these changes and implement as a separate package. How would that jive with the story of langchain.experimental though? Seems confusing to me to have an unsafe and safe way of doing things, both as external packages and possibly publicized by LangChain.

@leo-gan
Copy link
Collaborator

leo-gan commented Sep 15, 2023

@Jflick58 Hi , could you, please, resolve the merging issues? After that ping me and I push this PR for the review. Thanks!

@Jflick58
Copy link
Contributor Author

@leo-gan I thought @hwchase17 decided to move the repl functionality to langchain-experimental to avoid including this security issue in the main langchain install? Happy to work on this PR to include the wasm-exec functionality in the main package, but wanted to confirm. Thanks!

@leo-gan
Copy link
Collaborator

leo-gan commented Sep 16, 2023

@Jflick58 If this change suits experimental better, is it possible to start this change again as a new PR in experimental? If you want to do this, please, close this PR.

@EliahKagan
Copy link

EliahKagan commented Sep 16, 2023

Would it have to be a new PR? Given that langchain and langchain-experimental are both from this repository, could this PR be modified accordingly?

@leo-gan
Copy link
Collaborator

leo-gan commented Sep 17, 2023

It is up to you. This PR could be OK 👍

@ruan-azevedo
Copy link

Hey guys, any updates for this PR?

@Jflick58
Copy link
Contributor Author

Jflick58 commented Oct 3, 2023

@RuanAzevedo This is still on my to-do list. I'm traveling for work this week but hoping to pick it back up next week.

@hwchase17 hwchase17 removed the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Oct 4, 2023
@@ -66,6 +78,6 @@ def run(self, command: str, timeout: Optional[int] = None) -> str:
p.terminate()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the timeout branch also needs to be updated to pass self.wasm as arg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we pass self.wasm as an arg if it has access to the wasm class attribute?

@tabdunabi
Copy link

hi @Jflick58, any update on this PR (to resolve this CVE)?

@hwchase17 hwchase17 closed this Jan 30, 2024
@baskaryan baskaryan reopened this Jan 30, 2024
@kenziewritescode
Copy link

Is there any update on this?

@eyurtsev eyurtsev added 13 security 🤖:security Related to security issues, CVEs labels Mar 13, 2024
@hwchase17
Copy link
Contributor

closing as the CVE has been addressed through other means. i am supported of a wasm based exec tool in the future

@hwchase17 hwchase17 closed this May 9, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
13 security 🤖:security Related to security issues, CVEs
Projects
None yet