Skip to content

Commit

Permalink
Merge pull request #17 from ChinthapallyAkanksha/spark-3.3
Browse files Browse the repository at this point in the history
Updated Quenya-Dsl to support special characters for spark-3.3 branch
  • Loading branch information
mantovani authored Nov 7, 2022
2 parents 7ca75b6 + abe8e81 commit 07e4091
Show file tree
Hide file tree
Showing 11 changed files with 245 additions and 78 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ jobs:
cache: sbt
- name: Build and test scala version
run: |
PGPASSWORD="postgres" psql -c 'create database almaren;' -U postgres -h localhost
PGPASSWORD="postgres" psql -c "ALTER USER postgres PASSWORD 'foo' ;" -U postgres -h localhost
PGPASSWORD="postgres" psql -c 'create database almaren;' -U postgres -h localhost
PGPASSWORD="postgres" psql -c "ALTER USER postgres PASSWORD 'postgres' ;" -U postgres -h localhost
PGPASSWORD="postgres" psql -c 'create role runner;' -U postgres -h localhost
PGPASSWORD="postgres" psql -c 'ALTER ROLE "runner" WITH LOGIN SUPERUSER INHERIT CREATEDB CREATEROLE REPLICATION;' -U postgres -h localhost
sbt ++2.12.10 test
Expand Down
86 changes: 65 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,24 @@
# Quenya DSL
# Quenya-DSL

[![Build Status](https://github.com/music-of-the-ainur/quenya-dsl/actions/workflows/quneys-dsl-githubactions.yml/badge.svg)](https://github.com/music-of-the-ainur/quenya-dsl/actions/workflows/quneys-dsl-githubactions.yml)

Adding Quenya DSL dependency to your sbt build:
Adding Quenya-DSL dependency to your sbt build:

```
libraryDependencies += "com.github.music-of-the-ainur" %% "quenya-dsl" % "1.2.2-3.3"
libraryDependencies += "com.github.music-of-the-ainur" %% "quenya-dsl" % "1.2.2-$SPARK_VERSION"
```

To run in spark-shell:

```
spark-shell --packages "com.github.music-of-the-ainur:quenya-dsl_2.13:1.2.2-3.3"
spark-shell --packages "com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.0-$SPARK_VERSION"
```

### Connector Usage

#### Maven / Ivy Package Usage
The connector is also available from the
[Maven Central](https://mvnrepository.com/artifact/com.github.music-of-the-ainur)
repository. It can be used using the `--packages` option or the
`spark.jars.packages` configuration property. Use the following value
Quenya-Dsl is available in [Maven Central](https://mvnrepository.com/artifact/com.github.music-of-the-ainur)
repository.

| version | Connector Artifact |
| versions | Connector Artifact |
|----------------------------|-----------------------------------------------------------|
| Spark 3.3.x and scala 2.13 | `com.github.music-of-the-ainur:quenya-dsl_2.13:1.2.2-3.3` |
| Spark 3.3.x and scala 2.12 | `com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.2-3.3` |
Expand All @@ -32,7 +28,7 @@ repository. It can be used using the `--packages` option or the
| Spark 2.4.x and scala 2.11 | `com.github.music-of-the-ainur:quenya-dsl_2.11:1.2.2-2.4` |

## Introduction
Quenya DSL(Domain Specific Language) is a language that simplifies the task to parser complex semi-structured data.
Quenya-DSL(Domain Specific Language) is a language that simplifies the task to parser complex semi-structured data.

```scala

Expand Down Expand Up @@ -155,7 +151,7 @@ Output:

## DSL Generator

You can generate a DSL based on a DataFrame:
You can generate the DSL from an existing DataFrame:

```scala
import com.github.music.of.the.ainur.quenya.QuenyaDSL
Expand All @@ -165,6 +161,18 @@ val quenyaDsl = QuenyaDSL
quenyaDsl.printDsl(df)
```

### getDsl
You can generate and asssign a DSL to variable based on a DataFrame:

```scala
import com.github.music.of.the.ainur.quenya.QuenyaDSL

val df:DataFrame = ...
val quenyaDsl = QuenyaDSL
val dsl = quenyaDsl.getDsl(df)
```


json:
```
{
Expand Down Expand Up @@ -201,6 +209,50 @@ weapon@weapon

You can _alias_ using the fully qualified name using ```printDsl(df,true)```, you should turn on in case of name conflict.

## How to Handle Special Characters



Use the literal backtick **``** to handle special characters like space,semicolon,hyphen and colon.
Example:



json:
```
{
"name":{
"name One":"Mithrandir",
"Last-Name":"Olórin",
"nick:Names":[
"Gandalf the Grey",
"Gandalf the White"
]
},
"race":"Maiar",
"age":"immortal",
"weapon;name":[
"Glamdring",
"Narya",
"Wizard Staff"
]
}
```



DSL:
```
age$age:StringType
`name.Last-Name`$`Last-Name`:StringType
`name.name One`$`name-One`:StringType
`name.nick:Names`@`nick:Names`
`nick:Names`$`nick:Names`:StringType
race$race:StringType
`weapon;name`@`weapon;name`
`weapon;name`$`weapon_name`:StringType
```

## Backus–Naur form

```
Expand All @@ -216,14 +268,6 @@ You can _alias_ using the fully qualified name using ```printDsl(df,true)```, yo
| DoubleType | FloatType | ByteType | IntegerType | LongType | ShortType
```

## Requirements

| Software | Version |
|--------------|-----------|
| Java | 8 |
| Scala | 2.11/2.12 |
| Apache Spark | 2.4 |

## Author
Daniel Mantovani [daniel.mantovani@modak.com](mailto:daniel.mantovani@modak.com)

Expand Down
6 changes: 6 additions & 0 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ ThisBuild / developers := List(
name = "Daniel Mantovani",
email = "daniel.mantovani@modakanalytics.com",
url = url("https://github.com/music-of-the-ainur")
),
Developer(
id = "ChinthapallyAkanksha",
name = "Akanksha Chinthapally",
email = "akanksha.chinthapally@modak.com",
url = url("https://github.com/music-of-the-ainur")
)
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import scala.util.parsing.combinator._

/*
Spark DSL Backus-Naur form.
<dsl> ::= \{"[\r\n]*".r <precedence> <col> <operator> \}
<precedence> ::= "[\s\t]*".r
<col> ::= "a-zA-Z0-9_.".r [ element ]
Expand All @@ -15,9 +14,8 @@ import scala.util.parsing.combinator._
<@> ::= @ <alias>
<$> ::= $ <alias> : <datatype>
<alias> ::= "0-9a-zA-Z_".r
<datatype> ::= BinaryType | BooleanType | StringType | TimestampType | DecimalType
<datatype> ::= BinaryType | BooleanType | StringType | TimestampType | DecimalType
| DoubleType | FloatType | ByteType | IntegerType | LongType | ShortType
*/
private[quenya] trait CombinatorParser {
val parser = ParserQuenyaDsl
Expand All @@ -30,9 +28,16 @@ private[quenya] trait CombinatorParser {
}
}

object ParserQuenyaDsl extends JavaTokenParsers {
trait ParserUtil {
def removeLiteral(content: String, literal: String): String = {
if (content.head.toString == literal && content.last.toString == literal)
content.substring(1, content.length - 1)
else content
}
}
object ParserQuenyaDsl extends JavaTokenParsers with ParserUtil {
override val skipWhitespace = false

def dsl: Parser[List[Statement]] = repsep(expression,"""[\n\r]*""".r) ^^ (List() ++ _ )
def expression: Parser[Statement] = precedence ~ col ~ operator ^^ {
case prec ~ cl ~ op =>
Expand All @@ -45,16 +50,21 @@ object ParserQuenyaDsl extends JavaTokenParsers {
case al:String => Statement(prec,cl,AT,al)
}
}
def precedence: Parser[Int] = """[\t\s]*""".r ^^ (prec => prec.replaceAll(" ","\t").count(_ == '\t'))
def col: Parser[StateSelect] = """[0-9A-Za-z._]+""".r ~ opt(element) ^^ {
case a ~ Some(b) => StateSelect(a,b)
case a ~ None => StateSelect(a,None)
def precedence: Parser[Int] = """^[\t\s]*""".r ^^ (prec => prec.replaceAll(" ","\t").count(_ == '\t'))
def col: Parser[StateSelect] = """[\w.]+|`[\w. \-:;$]+`""".r ~ opt(element) ^^ {
case a ~ Some(b) => createStateSelect(a,b)
case a ~ None => createStateSelect(a,None)
}
private def createStateSelect(name: String, element: Option[String]): StateSelect =
StateSelect(removeLiteral(name,"`"),element)

def element: Parser[Option[String]] = "[" ~> opt("""\d+""".r) <~ "]"
def operator: Parser[Any] = at | dollar
def at: Parser[String] = "@" ~> alias
def dollar : Parser[Any] = "$" ~> alias ~ opt(":") ~ datatype
def alias : Parser[String] = "[0-9a-zA-Z_]+".r
def alias : Parser[String] = """\w+|`[\w \-:;$]+`""".r ^^ {
alias => removeLiteral(alias,"`")
}
def datatype : Parser[Option[DataType]] = ("BinaryType" ^^ (dt => Some(BinaryType))
| "FloatType" ^^ (dt => Some(FloatType))
| "ByteType" ^^ (dt => Some(ByteType))
Expand All @@ -67,4 +77,4 @@ object ParserQuenyaDsl extends JavaTokenParsers {
| "DoubleType" ^^ (dt => Some(DoubleType))
| "ShortType" ^^ (dt => Some(ShortType))
| "" ^^ (dt => None))
}
}
8 changes: 4 additions & 4 deletions src/test/resources/data.csv
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
age,LastName,nameOne,nickNames,race,weapon
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Glamdring
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Narya
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Wizard Staff
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Glamdring
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Narya
3500,Olórin,Mithrandir,Gandalf the Grey,Maiar,Wizard Staff
4500,"",Ilmarë,"",Ainur,Powers of the Ainur
3500,"",Morgoth,Bauglir,Ainur,Powers of the Ainur
3500,"",Morgoth,Bauglir,Ainur,Grond
3500,"",Morgoth,Bauglir,Ainur,Mace
3500,"",Morgoth,Bauglir,Ainur,Sword
3500,"",Manwë,"King of Arda,",Ainur,Powers of the Ainur
3500,"",Manwë,"King of Arda,",Ainur,Powers of the Ainur
50 changes: 50 additions & 0 deletions src/test/resources/data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
{
"Coffee": {
"sub region": [
{
"id": 1,
"full name": "John Doe"
},
{
"id": 2,
"name": "Don Joeh"
}
],
"country": {
"id": 2,
"company": "ACME"
}
},
"brewing": {
"sub-region": [
{
"id": 1,
"name": "John Doe"
},
{
"id": 2,
"name": "Don Joeh"
}
],
"world:country": {
"id": 2,
"company": "ACME"
}
},
"brewing2": {
"sub;region": [
{
"id": 1,
"name": "John Doe"
},
{
"id": 2,
"name": "Don Joeh"
}
],
"world;country": {
"id": 2,
"company": "ACME"
}
}
}
Binary file not shown.
Binary file not shown.
Empty file.
Binary file not shown.
Loading

0 comments on commit 07e4091

Please # to comment.