Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[ZEPPELIN-6121] Write a Dockerfile for python interpreter image build #4865

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions python/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

FROM openjdk:11 as builder

COPY . /zeppelin/

WORKDIR /zeppelin

RUN chmod +x ./mvnw

RUN ./mvnw clean package -am -pl zeppelin-interpreter-shaded,zeppelin-interpreter,python -DskipTests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if it might be better to ensure that %python, %python.ipython, and %python.sql are all supported by default.
My thinking comes from the fact that these interpreters are listed in the overview section of the Zeppelin Python interpreter documentation, and IPython is also recommended there.

For IPython, it seems that adding the following command here can be useful for installing the necessary packages:

Suggested change
RUN pip install jupyter-client grpcio protobuf~=3.20 ipython ipykernel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbonelee
Thank you for your comment. I have applied your suggestion and agree with you. That said, there are additional libraries, like pandas, that wouldn't be installed. In my opinion, we need an alternative solution to inject these libraries without having to rewrite/deploy the Dockerfile. I think it'd be better to address this in a separate task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your point.
I believe that handling additional packages, like pandas, would require a more flexible approach. Such as using a conda-like package management system. It might be better to address this in a separate PR later.


FROM openjdk:11

RUN apt-get update && \
apt-get install -y python3 python3-pip && \
pip3 install jupyter-client grpcio protobuf~=3.20 ipython ipykernel && \
ln -s /usr/bin/python3 /usr/bin/python && \
rm -rf /var/lib/apt/lists/*

COPY --from=builder /zeppelin/bin /zeppelin/bin/
COPY --from=builder /zeppelin/conf /zeppelin/conf

COPY --from=builder /zeppelin/interpreter/python /zeppelin/interpreter/python
COPY --from=builder /zeppelin/zeppelin-interpreter-shaded/target /zeppelin/zeppelin-interpreter-shaded/target

WORKDIR /zeppelin

ENV PYTHON_INTERPRETER_PORT=8084

RUN chmod +x ./bin/interpreter.sh

CMD ./bin/interpreter.sh \
-d ./interpreter/python \
-c host.docker.internal \
-p "${INTERPRETER_EVENT_SERVER_PORT}" \
-r "${PYTHON_INTERPRETER_PORT}:${PYTHON_INTERPRETER_PORT}" \
-i python-shared_process \
-l ./local-repo \
-g python
32 changes: 32 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,35 @@ Current interpreter delegate the whole work to ipython kernel via `jupyter_clien
Zeppelin interpreter process will communicate with the python process via `grpc`. Ideally every feature works in IPython should work in Zeppelin as well.


## Run the interpreter with docker
You can run the python interpreter as a standalone docker container.

### Step 1. Specify the configuration for the interpreter
```bash
# conf/interpreter.json

"python": {
...
"option":
} {
"remote": true,
"port": {INTERPRETER_PROCESS_PORT_IN_HOST},
"isExistingProcess": true,
"host": "localhost",
...
}
````

### Step 2. Build and run the interpreter
```bash
zeppelin $ ./mvnw clean install -DskipTests

zeppelin $ ./bin/zeppelin-daemon.sh start # start zeppelin server.
# check the port of the interpreter event server. you can find it by looking for the log that starts with "InterpreterEventServer is starting at"

zeppelin $ docker build -f ./python/Dockerfile -t python-interpreter .

zeppelin $ docker run -p {INTERPRETER_PROCESS_PORT_IN_HOST}:8084 \
-e INTERPRETER_EVENT_SERVER_PORT={INTERPRETER_EVENT_SERVER_PORT} \
python-interpreter
```