Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Question about Zero-shot Domain Transfer. #53

Open
wu39848 opened this issue Aug 25, 2024 · 4 comments
Open

Question about Zero-shot Domain Transfer. #53

wu39848 opened this issue Aug 25, 2024 · 4 comments

Comments

@wu39848
Copy link

wu39848 commented Aug 25, 2024

Hi,thank you for your great work!When I using the model pretrained on scannet without label as you provided to test on s3dis,I found that the results were worse than those found in Table 14 of the supplementary material.
image

@jihanyang
Copy link
Member

Hello, we do not include background categories in table 14 (i.e., ceiling, floor, wall). You can refer to this in the caption of table 14.

@wu39848
Copy link
Author

wu39848 commented Aug 26, 2024

This is my yaml file,after I ignored the background categories, I got the following result:
image
image
Ignoring the background categories doesn't seem to work, I don't know where the error occurred.

@jihanyang
Copy link
Member

can you show the whole yaml file?

@wu39848
Copy link
Author

wu39848 commented Aug 27, 2024

This is the whole yaml file:
CLASS_NAMES: [ceiling, floor, wall, beam, column, window, door, table, chair, sofa, bookcase, board, clutter]

DATA_CONFIG:
BASE_CONFIG: cfgs/dataset_configs/s3dis_dataset.yaml
ignore_class_idx: [0,1,2,12]

MODEL:
NAME: SparseUNetTextSeg
REMAP_FROM_3DLANG: False
REMAP_FROM_NOADAPTER: False

VFE:
NAME: IndoorVFE
USE_XYZ: True

BACKBONE_3D:
NAME: SparseUNetIndoor
IN_CHANNEL: 6
MID_CHANNEL: 16
BLOCK_RESIDUAL: True
BLOCK_REPS: 2
NUM_BLOCKS: 7
CUSTOM_SP1X1: True

ADAPTER:
NAME: VLAdapter
EVAL_ONLY: False
NUM_ADAPTER_LAYERS: 2
TEXT_DIM: -1
LAST_NORM: False
FEAT_NORM: False

TASK_HEAD:
NAME: TextSegHead

TEXT_EMBED:
  NAME: CLIP
  NORM: True
  PATH: text_embed/s3dis_clip-ViT-B16_id.pth

LOGIT_SCALE:
  value: 1.0
  learnable: False

TEXT_ENCODER:
NAME: CLIP
BACKBONE: ViT-B/16 # ['RN50', 'RN101', 'RN50x4', 'RN50x16', 'RN50x64', 'ViT-B/32', 'ViT-B/16', 'ViT-L/14']
TEMPLATE: identity
EXTRACT_EMBED: False # Online extract text embeding from class or not

OPTIMIZATION:
TEST_BATCH_SIZE_PER_GPU: 1
BATCH_SIZE_PER_GPU: 4
NUM_EPOCHS: 32
LR: 0.004 # 4e-3
SCHEDULER: cos_after_step
OPTIMIZER: adamw
WEIGHT_DECAY: 0.0001
MOMENTUM: 0.9
STEP_EPOCH: 20
MULTIPLIER: 0.1
CLIP_GRAD: False
PCT_START: 0.39
DIV_FACTOR: 1
MOMS: [0.95, 0.85]
LR_CLIP: 0.000001

OTHERS:
PRINT_FREQ: 20
EVAL_FREQ: 5
SYNC_BN: False
USE_AMP: True

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants