Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

docker image with ocr enabled #54

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docd/debian_ocr.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env sh

# Debian alternative build with OCR enabled.

# Build context must be the GOPATH where docconv and gosseract are contained.

# Build runs on the Docker image, which is more reliable when working from other
# OS than Linux.

export NAME=docd
export VERSION=debian
export DOCKERFILE=$GOPATH/src/code.sajari.com/docconv/docd/debian_ocr/Dockerfile

echo "Building ${NAME} for ${VERSION} with OCR enabled..."

echo "GOPATH: ${GOPATH}"

echo "Dockerfile: ${DOCKERFILE}"

docker build \
-t $NAME:ocr \
-f $DOCKERFILE \
$GOPATH
27 changes: 27 additions & 0 deletions docd/debian_ocr/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
FROM debian

RUN apt-get update
RUN apt-get install -y zip
RUN apt-get install -y poppler-utils
RUN apt-get install -y wv
RUN apt-get install -y unrtf
RUN apt-get install -y tidy
RUN apt-get install -y lynx
RUN apt-get install -y libtesseract-dev
RUN apt-get install -y libleptonica-dev
RUN apt-get install -y tesseract-ocr-eng
RUN apt-get install -y git
RUN apt-get install -y golang

# Build context must be the host GOPATH
COPY . /goworkspace

WORKDIR /goworkspace/src/code.sajari.com/docconv/docd

ENV GOPATH=/goworkspace

RUN GOOS=linux GOARCH=amd64 go build -tags ocr -o /docd

EXPOSE 8888

CMD ["/docd"]