This repository contains sample ready-made steps for titanoboa (github repository is here ):
🧬 Bioinformatics 🔬
- 🧬 K-mer Count
Provides functions to list, start and stop EC2 instances. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.aws.ec2
io.titanoboa.tasklet.aws.ec2/list-instances
{:type :aws-ec2-list,
:supertype :tasklet,
:description "Lists all EC2 instances for all reservations.\nReturns :ec2-instances key with list of instances as a value:\n{:ec2-instances [{instance1 map} {instance2 map} ...]}",
:properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ec2/list-instances", :type "clojure"}}
io.titanoboa.tasklet.aws.ec2/start-instances
{:type :aws-ec2-start,
:supertype :tasklet,
:description "Starts an EC2 instance.\nReturns :starting-instances key with status value map.",
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ec2/start-instances", :type "clojure"}
:properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :instance-ids ["i-0a123a454b678aeb6"]}
}
io.titanoboa.tasklet.aws.ec2/stop-instances
{:type :aws-ec2-stop,
:supertype :tasklet,
:description "Stops an EC2 instance.\nReturns :stopping-instances key with status value map.",
:properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :instance-ids ["i-0a123a454b678aeb6"]},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ec2/stop-instances", :type "clojure"}
}
Provides functions to read, download and upload S3 objects. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.aws.s3
io.titanoboa.tasklet.aws.s3/read
{:type :aws-s3-read,
:supertype :tasklet,
:description "Reads textual content of a s3 file and returns it as a job property :s3-object",
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.s3/read", :type "clojure"}
:properties {:key "index.html", :credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :bucket ""}}
io.titanoboa.tasklet.aws.s3/download
{:type :aws-s3-download,
:supertype :tasklet,
:description "Downloads a file from s3 bucket to job directory under the specified name.",
:properties {:key "index.html", :credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :save-as "path/to/file", :bucket "bucket-name"},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.s3/download", :type "clojure"}}
io.titanoboa.tasklet.aws.s3/upload
{:type :aws-s3-upload,
:supertype :tasklet,
:description "Uploads specified file from job directory into the given s3 bucket.",
:properties {:key "index.bkp", :credentials {:access-key "", :secret-key "", :endpoint "eu-central-1"}, :file-path "index.html", :bucket ""},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.s3/upload", :type "clojure"}}
Provides functions to send email via AWS SES. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.aws.ses
io.titanoboa.tasklet.aws.ses/send-email
{:type :aws-ses,
:supertype :tasklet,
:description "Sends an email via SES.\nReturns :message-id key with message id value.\n",
:properties {:credentials {:access-key "", :secret-key "", :endpoint "eu-west-1"}, :from "info@titanoboa.io",
:message {:body {:html "testing 1-2-3-4", :text "testing 1-2-3-4"}, :subject "greetings from titanoboa"}, :to ["miro@titanoboa.io"]},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.aws.ses/send-email", :type "clojure"}}
Provides functions to send notification via AWS SNS. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.aws.sns
io.titanoboa.tasklet.aws.sns/publish
{:type :aws-sns,
:supertype :tasklet,
:description "Publishes a message into an SNS topic.",
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.aws.sns/publish",
:type "clojure"},
:properties {:topic-arn "arn:aws:sns:us-east-1:676820690883:my-topic",
:subject "test",
:message "",
:message-attributes {"attr" "value"}}}
Provides functions to send message via AWS SQS. Primarily uses amazonica library. Refer to the library's documentation for detailed information on the supported properties.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.aws.sqs
io.titanoboa.tasklet.aws.sqs/send-message
{:type :aws-sqs,
:supertype :tasklet,
:description "Sends a text message to a queue.",
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.aws.sqs/send-message",
:type "clojure"},
:properties {:credentials {:access-key "",
:secret-key "",
:endpoint "eu-central-1"},
:message-attributes {},
:message-body "",
:queue-url ""}}
Performs a JDBC query and returns corresponding data. Note that code of jdbc tasklet is part of standard Titanoboa distribution and is not in this repository.
- Add whatever jdbc driver you need to use to titanoboa's ./lib folder
- Require namespace:
titanoboa.tasklet.jdbc
in titanoboa's external dependencies file. You may also need to requiretitanoboa.system.jdbc
(see point 3.) - Do not forget to also define and configure corresponding jdbc system for DB connection pooling in your server configuration (in this example there is a connection pool system :test-db that is using
titanoboa.system.jdbc/jdbc-pool
titanoboa.tasklet.jdbc/query
{:type :jdbc
:supertype :tasklet
:workload-fn #titanoboa.exp/Expression {:value "titanoboa.tasklet.jdbc/query"}
:properties {:response-property-name :db-data
:data-source-ks [:test-db :system :pool]
:query {:select [:o.ordernumber :o.TotalAmount :c.FirstName :c.LastName :c.City :c.Country],
:from [[:customers :c]]
:left-join [[:orders :o] [:= :c.id :o.customerid]]
:order-by [[:o.totalamount :desc :nulls-last]]
:limit 50}}}
Expected step properties are as follows:
:query
- either a query string or a map in honeysql format:data-source-ks
key set pointing to the JDBC data source object among the running systems, when used withtitanoboa.system.jdbc/jdbc-pool
the format is[:< jdbc pool systemu> :system :pool]
so e.g. if the jdbc system is:test-db
then it is[:test-db :system :pool]
:response-property-name
is self-explanatory
Makes an http(s) call and returns (parsed) response. Primarily uses clj-http library. Refer to the library's documentation for detailed information on the generation process and all supported properties.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.httpclient
io.titanoboa.tasklet.httpclient/request
{:type :http-client
:supertype :tasklet
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.httpclient/request" :type "clojure"}
:properties {:url "https://jsonplaceholder.typicode.com/posts/1"
:request-method :get
:as :json
:proxy-host "127.0.0.1"
:proxy-port 8118
:response-property-name :rest-response
:body-only? false
:connection-pool {:timeout 5 :threads 4 :insecure? false :default-per-route 10}}}
Sends email via smtp. Primarily uses postal library. Refer to the library's documentation for detailed information on the generation process and all supported properties.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.smtp
titanoboa.tasklet.smtp/send
{:type :smtp
:supertype :tasklet
:workload-fn #titanoboa.exp/Expression{:value "titanoboa.tasklet.smtp/send"}
:properties {:connection {:host "localhost"
:port 25
:user ""
:pass ""
:ssl false
:tls false}
:email {:from "miro@example.bla"
:to "joe@example.com"
:cc ["joe@example.com", "jim@example.com", "jeff@example.com"]
:bcc "archive@example.com"
:subject "Cat!"
:date #titanoboa.exp/Expression{:value "(java.util.Date.)"}
:message-id ""
:user-agent ""
:body [{:type "text/plain"
:content "Hey folks,\n\nCheck out these pictures of my cat!"}
{:type :inline
:content #titanoboa.exp/Expression{:value "(File. \"/tmp/lester-flying-photoshop\")"}
:content-type "image/jpeg"
:file-name "lester-flying.jpeg"}
{:type :attachment
:content #titanoboa.exp/Expression{:value "(File. \"/tmp/lester-upside-down.jpeg\")"}}]}}}
SSH and SFTP Client. Primarily uses clj-ssh library. Refer to the library's documentation for detailed information on the generation process and all supported properties.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.ssh
io.titanoboa.tasklet.ssh/ssh
{:type :ssh,
:supertype :tasklet,
:description "SSH Client",
:properties {:ssh-agent-settings {:use-system-ssh-agent false},
:identities {:private-key-path "/path/to/key.pem"},
:ssh-cmd-map {:in "echo hello"},
:host "xxx.eu-central-1.compute.amazonaws.com",
:session-options {:username "ec2-user", :strict-host-key-checking "no", :preferred-authentications "publickey"}},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.ssh/ssh", :type "clojure"}}
io.titanoboa.tasklet.ssh/sftp
{:type :sftp,
:supertype :tasklet,
:description "SFTP Client",
:properties {:ssh-agent-settings {:use-system-ssh-agent false},
:identities {:private-key-path "/path/to/key.pem"},
:sftp-cmds-vec [[:ls "/home/ec2-user/"]],
:host "xxx.eu-central-1.compute.amazonaws.com",
:session-options {:username "ec2-user",
:strict-host-key-checking "no",
:preferred-authentications "publickey"}},
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.ssh/sftp", :type "clojure"}}
Generates a pdf file based on job properties. Primarily uses clj-pdf library. Refer to the library's documentation for detailed information on the generation process and all supported properties.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.pdf
io.titanoboa.tasklet.pdf/generate-pdf
{:pdf-sections [[:list {:roman true}
[:chunk {:style :bold} "a bold item"]
"another item"
"yet another item"]
[:phrase "some text"]
[:phrase "some more text"]
[:paragraph "yet more text"]]
:file-name "example.pdf"
:pdf-metadata {:bottom-margin 10, :creator "Jane Doe", :doc-header ["inspired by" "William Shakespeare"], :right-margin 50, :left-margin 10, :footer "page", :header "page header", :size "a4", :title "Test doc", :author "John Doe", :top-margin 20, :subject "Some subject"}}
{:type :pdf-generation
:supertype :tasklet
:properties
{:pdf-sections [[:list {:roman true}
[:chunk {:style :bold} "a bold item"]
"another item"
"yet another item"]
[:phrase "some text"]
[:phrase "some more text"]
[:paragraph "yet more text"]]
:file-name "example.pdf"
:pdf-metadata {:bottom-margin 10, :creator "Jane Doe", :doc-header ["inspired by" "William Shakespeare"], :right-margin 50, :left-margin 10, :footer "page", :header "page header", :size "a4", :title "Test doc", :author "John Doe", :top-margin 20, :subject "Some subject"}}
:workload-fn #titanoboa.exp.Expression{:value "io.titanoboa.tasklet.pdf/generate-pdf", :type "clojure"}}
A simple Kafka producer and consumer. Primarily uses dvlopt/kafka library. Refer to the library's documentation for detailed information on the generation process and all supported properties.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.kafka
io.titanoboa.tasklet.kafka/produce
{:type :kafka-produce,
:supertype :tasklet,
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kafka/produce",
:type "clojure"},
:properties {:kafka-producer-config {:dvlopt.kafka/nodes [["localhost"
9092]],
:dvlopt.kafka/serializer.key :long,
:dvlopt.kafka/serializer.value :string,
:dvlopt.kafka.out/configuration {"client.id" "my-producer",
"transactional.id" "some transaction id"}},
:records [{:topic "test-topic",
:key 123,
:value "Hello World!"}]}}
io.titanoboa.tasklet.kafka/consume
{:type :kafka-consume,
:supertype :tasklet,
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kafka/consume",
:type "clojure"},
:properties {:kafka-topics ["test-topic"],
:poll-options {:dvlopt.kafka/timeout [1
:seconds]},
:kafka-consumer-config {:dvlopt.kafka/nodes [["localhost"
9092]],
:dvlopt.kafka/deserializer.key :long,
:dvlopt.kafka/deserializer.value :string,
:dvlopt.kafka.in/configuration {"auto.offset.reset" "earliest",
"enable.auto.commit" false,
"max.poll.records" "50",
"group.id" "my-group"}}}}
Few simple functions to help with K-mer counting and analysis of FASTQ data files. Also contains functions for splitter (map) and agregator (reduce) type of steps to help with parallel processing.
Note that a thought needs to be put into what underlying file system that would be used (e.g. HDFS, EFS etc.) and whether a physical splitting of the file would be performed prior to the counting.
- Add following maven coordinates into titanoboa's external dependencies file:
- Require namespace:
io.titanoboa.tasklet.kmer
io.titanoboa.tasklet.kmer/kmer-count
{:create-folder? false,
:fastq-file "/path/to/fastq/file",
:start 0,
:end 12,
:k 3,
:top-n 10}
io.titanoboa.tasklet.kmer/split-fastq
io.titanoboa.tasklet.kmer/reduce-kmers
{:fastq-file "/path/to/fastq/file",
:k 3,
:split-to 12}
{:first-step "splitter",
:name "kmer-map-reduce",
:revision 4,
:type nil,
:properties {:fastq-file "/mnt/efs/sars2/reclojure.fastq",
:k 3,
:split-to 12,
:top-n 10},
:steps [{:id "splitter",
:type :map,
:supertype :map,
:next [["*" "aggregator"]],
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kmer/split-fastq",
:type "clojure"},
:properties {:jobdef-name "k-mer-count",
:sys-key :core,
:standalone-system? false},
:revision 1}
{:id "aggregator",
:type :reduce,
:supertype :reduce,
:workload-fn #titanoboa.exp/Expression{:value "io.titanoboa.tasklet.kmer/reduce-kmers",
:type "clojure"},
:next [],
:properties {:map-step-id "splitter", :commit-interval 100},
:revision 1}]}