Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

(@aws-cdk.aws-glue-alpha): Glue table separator + skip header #23132

Closed
aram-eskandari opened this issue Nov 29, 2022 · 6 comments · Fixed by #24498
Closed

(@aws-cdk.aws-glue-alpha): Glue table separator + skip header #23132

aram-eskandari opened this issue Nov 29, 2022 · 6 comments · Fixed by #24498
Assignees
Labels
@aws-cdk/aws-glue Related to AWS Glue documentation This is a problem with documentation. effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2

Comments

@aram-eskandari
Copy link

Describe the issue

I have created a glue-table using DataFormat.CSV with version aws-cdk.aws-glue-alpha==2.51.1a0 and python 3.7.13. The code looks similar to:
Screenshot 2022-11-29 at 01 36 55

Now I'm trying to adjust the parameters skip.header.line.count and separatorChar. Specifically I want to skip the 1st row and use a semicolon separator.

Using L1's, the code would look similar to:
Screenshot 2022-11-29 at 01 27 19

Looked through the L2-documentation but couldn't find anything regarding this, any tips?

On a separate note, there is a repeating sentence in the documentation:
Screenshot 2022-11-29 at 01 42 39

Links

https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_glue_alpha/README.html#table

@aram-eskandari aram-eskandari added documentation This is a problem with documentation. needs-triage This issue or PR still needs to be triaged. labels Nov 29, 2022
@github-actions github-actions bot added the @aws-cdk/aws-glue Related to AWS Glue label Nov 29, 2022
@peterwoodworth
Copy link
Contributor

On a quick glance it doesn't look like this is supported on out L2, and will need to be done either with the L1 like you've done, or using escape hatches

Source code:

const tableResource = new CfnTable(this, 'Table', {

As for the docs, this should be easy to fix by removing the repeated sentence from the README https://github.com/aws/aws-cdk/blob/v2.51.1/packages/%40aws-cdk/aws-glue/README.md

@peterwoodworth peterwoodworth added p2 feature-request A feature should be added or improved. effort/small Small work item – less than a day of effort documentation This is a problem with documentation. and removed needs-triage This issue or PR still needs to be triaged. documentation This is a problem with documentation. labels Nov 29, 2022
@aram-eskandari
Copy link
Author

i've never used escape hatches will have to read up on it, thanks for your response and the pointer @peterwoodworth!

@WtfJoke
Copy link
Contributor

WtfJoke commented Jul 5, 2023

@markusl The skip.header.line.count is a table properties and therefore can be set like:

const cfnTable = table.node.defaultChild as CfnTable;
cfnTable.addPropertyOverride("TableInput.Parameters", {
    "skip.header.line.count": "1",
});

@markusl
Copy link
Contributor

markusl commented Jul 5, 2023

@WtfJoke Thanks! Edited my comment for clarity. Overriding TableInput.Parameters seems to work for setting skip.header.line.count.

@WtfJoke
Copy link
Contributor

WtfJoke commented Jul 5, 2023

@markusl yeah we tested this successfully. Are you sure the properties are visible as table properties?
As far as I remember with your snippet it was in the serde properties instead of the table properties.

We have also serde properties overwritten:

  /*
      Set additional serializer properties (since its not possible with the current L2 construct):
      - Switch serializer from OpenCSV to LazySimpleSerDe (OpenCSV supports only string).
      - Specify ',' as field separator (default is tab)
   */
  cfnTable.addPropertyOverride("TableInput.StorageDescriptor.SerdeInfo", {
      SerializationLibrary: "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
      Parameters: {
          "field.delim": ",",
      },
  });

@mergify mergify bot closed this as completed in #24498 Jul 31, 2023
mergify bot pushed a commit that referenced this issue Jul 31, 2023
Includes a `storageParameters` property, allowing developers to access the `tableInput.storageDescriptor.parameters` property within the `CfnTable` resource. 

Closes #23132.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
@aws-cdk/aws-glue Related to AWS Glue documentation This is a problem with documentation. effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants