Non-reproducibility in TrackerPhase2OTL1Track #47071

makortel · 2025-01-09T19:30:39Z

Tests of PRs unrelated to L1T show differences in workflows 29634.911 and 29834.999 in TrackerPhase2OTL1Track, TrackerPhase2OTL1TrackV, and L1T folders. In #47051 (comment)

29634.911 had 66 differences
29834.999 had 15 differences

makortel · 2025-01-09T19:30:48Z

assign l1, dqm, upgrade

cmsbuild · 2025-01-09T19:30:57Z

New categories assigned: l1,dqm,upgrade

@aloeliger,@antoniovagnerini,@epalencia,@Moanwar,@rseidita,@srimanob,@subirsarkar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2025-01-09T19:30:59Z

cms-bot internal usage

cmsbuild · 2025-01-09T19:31:00Z

A new Issue was created by @makortel.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

mmusich · 2025-01-21T13:19:33Z

@tomalin FYI

skinnari · 2025-01-22T15:39:53Z

hi @makortel , i am confused why these are all showing as failures. if i look at the actual histograms, they all look fine (e.g. https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_15_0_X_2025-01-09-1100+9e6aa1/66377/29634.911_TTbar_14TeV+Run4D110_DD4hep/TrackerPhase2OTL1Track_Tracks_HQ.html). there is one entry difference between the two sets (246 vs 247), is that what is causing these all to be flagged as red?

makortel · 2025-01-22T16:42:16Z

there is one entry difference between the two sets (246 vs 247), is that what is causing these all to be flagged as red?

Probably? (technical question would be for @cms-sw/pdmv-l2 whose histogram comparison infrastructure is being used in PR tests)

Looking at https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_15_0_X_2025-01-09-1100+9e6aa1/66377/29634.911_TTbar_14TeV+Run4D110_DD4hep/TrackerPhase2OTL1Track__Tracks_HQ_Track_HQ_NStubs.png

There is "clear" difference between blue and red in the 4-5 bin (probably by 1).

AdrianoDee · 2025-01-23T09:53:22Z

The original sin there is that the comparison is performed via the BinToBin statistical tests (instead of the Chi2) and the default threshold is set to be 0.9999. It is basically checking if the bins are identical not taking into account any uncertainty. Then the rank is the fraction of perfectly matched bins. So in cases like this with very few bins also a single mismatch trigger the failure. I'm trying to find where the BinToBin method is selected.

Now the question would be: do we want to spot these discrepancies? Maybe this case is a bit pathological (and the test could, e.g., take into account the histogram population), but in general I think it would be interesting to be aware of this irreproducibilities given we run exactly on the same events.

AdrianoDee · 2025-01-23T10:12:16Z

I'm trying to find where the BinToBin method is selected.

Ok, of course all the PR comparisons are BinToBin, while usually for the RelMon we use the Chi2. And actually the threshold is way higher: 0.999999999999.

makortel · 2025-01-23T14:16:28Z

Now the question would be: do we want to spot these discrepancies? Maybe this case is a bit pathological (and the test could, e.g., take into account the histogram population), but in general I think it would be interesting to be aware of this irreproducibilities given we run exactly on the same events.

So far we have (in practice, at least) required CPU code to be fully reproducible within the same x86 microarchitecture and CPU vendor when running on 1 thread. In all cases so far the cause for non-reproducibility has been a bug somewhere.

srimanob · 2025-01-23T15:33:22Z

Is this issue an extension of #45505 ?

makortel · 2025-01-23T15:45:11Z

Is this issue an extension of #45505 ?

Based on the (little) information in #45505, I'd guess that issue would have a different cause than that is reported here.

makortel · 2025-02-26T17:20:06Z

Is anyone looking into this issue?

cmsbuild added dqm-pending l1-pending pending-signatures upgrade-pending labels Jan 9, 2025

makortel mentioned this issue Jan 16, 2025

Created DroppedDataProductResolver #47117

Merged

mmusich mentioned this issue Jan 21, 2025

fix EventSetupConflict for producer 'SteppingHelixPropagator' #47151

Merged

antoniovagnerini mentioned this issue Jan 21, 2025

[DQM] Apply code checks/format #47002

Merged

This was referenced Jan 21, 2025

Reduced what data held in BranchDescription #47118

Merged

Fix UnnecessaryMutableChecker #47014

Merged

antoniovagnerini mentioned this issue Jan 22, 2025

Rename reduceRange to reducePhiRange #47154

Merged

makortel mentioned this issue Feb 14, 2025

Added VPSetTemplate to configuration #47354

Merged

antoniovagnerini mentioned this issue Feb 18, 2025

Adding SimpleTrackValidation Analyzer #47370

Merged

makortel mentioned this issue Feb 18, 2025

Use HardwareResourcesDescription in ProcessConfiguration #47355

Merged

This was referenced Feb 18, 2025

Fixing RelMon Pages Stylesheets Locations #47382

Merged

protect Phase2IT{OT}ValidateCluster against missing input collections #47398

Merged

This was referenced Feb 19, 2025

Outer Tracker Phase2 DQM: additional stub validation plots #45475

Merged

Developing offline JetMET DQM for Scouting jets - complementary to PR #47212 #47328

Merged

makortel mentioned this issue Feb 20, 2025

[15_0_X] Use HardwareResourcesDescription in ProcessConfiguration #47416

Merged

This was referenced Feb 24, 2025

Add streamers for onlinebeammonitor unit test cms-data/DQM-Integration#9

Merged

introduce @phase2FakeHLT and @phase2ValidationFakeHLT DQM / Validation sequences #47401

Merged

makortel mentioned this issue Feb 24, 2025

Update cmsRunGP executable to link with gperftools libprofiler.so after toolfile change #47277

Merged

This was referenced Feb 24, 2025

move unit test for onlinebeammonitor_dqm_sourceclient-live to use streamer files #47418

Merged

Improvements to HLTObjectMonitor #47429

Merged

This was referenced Feb 26, 2025

Added unit test for checking import ROOT openat calls #43596

Merged

Move CPUServiceBase, RootHandlers, and TimingServiceBase to FWCore/AbstractServices #47466

Merged

Revert "Add HardwareResourcesInformation printout to edmProvDump" #47473

Merged

wddgit mentioned this issue Feb 27, 2025

Access PathsAndConsumesOfModules from new signal #47467

Open

antoniovagnerini mentioned this issue Feb 28, 2025

miscellaneous improvements to HLTriggerOffline/Scouting plugins #47471

Merged

makortel mentioned this issue Feb 28, 2025

Added csv output option to edmModuleEventAllocMonitorAnalyze.py #47481

Merged

antoniovagnerini mentioned this issue Mar 4, 2025

Reading TriggerResults from an InputTag for B2G HLT Analyzers #47483

Merged

This was referenced Mar 5, 2025

Removed Principal::adjustToNewProductRegistry #47486

Merged

[15_0_X] XrdAdaptor: improve exception messages #47516

Open

This was referenced Mar 7, 2025

remove L1T legacy dependencies from pixel and strip DQM clients [14_2_X] #47509

Open

remove L1T legacy dependencies from pixel and strip DQM clients [15_0_X] #47510

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-reproducibility in TrackerPhase2OTL1Track #47071

Non-reproducibility in TrackerPhase2OTL1Track #47071

makortel commented Jan 9, 2025

makortel commented Jan 9, 2025

cmsbuild commented Jan 9, 2025

cmsbuild commented Jan 9, 2025

cmsbuild commented Jan 9, 2025

mmusich commented Jan 21, 2025

skinnari commented Jan 22, 2025

makortel commented Jan 22, 2025

AdrianoDee commented Jan 23, 2025 •

edited

Loading

AdrianoDee commented Jan 23, 2025

makortel commented Jan 23, 2025

srimanob commented Jan 23, 2025

makortel commented Jan 23, 2025

makortel commented Feb 26, 2025

Non-reproducibility in TrackerPhase2OTL1Track #47071

Non-reproducibility in TrackerPhase2OTL1Track #47071

Comments

makortel commented Jan 9, 2025

makortel commented Jan 9, 2025

cmsbuild commented Jan 9, 2025

cmsbuild commented Jan 9, 2025

cmsbuild commented Jan 9, 2025

mmusich commented Jan 21, 2025

skinnari commented Jan 22, 2025

makortel commented Jan 22, 2025

AdrianoDee commented Jan 23, 2025 • edited Loading

AdrianoDee commented Jan 23, 2025

makortel commented Jan 23, 2025

srimanob commented Jan 23, 2025

makortel commented Jan 23, 2025

makortel commented Feb 26, 2025

AdrianoDee commented Jan 23, 2025 •

edited

Loading