Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How to get dependency among projects #26

Open
fhbzc opened this issue May 12, 2022 · 12 comments
Open

How to get dependency among projects #26

fhbzc opened this issue May 12, 2022 · 12 comments

Comments

@fhbzc
Copy link
Member

fhbzc commented May 12, 2022

Hi Audris:
I'm trying to get the dependency among projects.
My current approach is
1. get the commits of a given project with P2c
2. get the packages those commits rely on with c2PtAbflDefFullUX.s
However, the second steps give me package names, not project names (e.g. tensorflow instead of tensorflow_tensorflow), is there a way I can find the corresponding project name based on the package name?
Thank you

@audrism
Copy link
Collaborator

audrism commented May 12, 2022

  1. Files in repos point to package names not other repos.
  2. Where the source code is located varies, it could be local, local not under VCS, in some other repo that build environment uses, or located on package manager (most common)
  3. package manager is NOT a repo and it may point to repos where the package is implemented, but not always. These pointers may change over time and, in many cases, do not point to the true upstream repo but to a collection of version dumps as in cran/cpan.
  4. There are typically many repos where package is implemented and there are many ways to decide which one (or ones) to use as the "master" repo(s)

You are also confused about the Def vs Pkg usage. Def means the packages is implemented there, so the associated repo is potentially the "master" repo. c2PtAbflPkgFullUX.s is a much bigger file that specifies dependencies: so the repo-package pair here is the depends relationship, while in c2PtAbflDefFullUX.s it represents "implements" relationship.

Joining both by package name transfers dependency relationship to repos.

Finally, at this stage, you probably don't care about individual files and times of commits, so aggregates
da5:/data/play/releases/{Def,Pkg}2PFullU.s are what you need.

Finally, the syntax of the way packages are defined and dependencies are specified varies with language and are not always identical in define and depend form. As such the join of {Def,Pkg}2PFullU.s needs to be done based on that language-specific understanding.

@fhbzc
Copy link
Member Author

fhbzc commented May 12, 2022

Hi Audris:
Thank you for the reply. In c2PtAbflPkgFullUX.s, I get dependency info like stdlib.h, which is essentially a file name, not a package name, so how should I get the package name instead? Should I just remove the postfix(change it to stdlib)?
Correct me if I'm wrong, now I'm thinking the correct way to get the dependency info of a given repo, is by

1. get all commits that contain a dependency of a focal project from c2PtAbflPkgFullUX.s
2. somehow convert the dependency file to the package it depends
3. get the commits and projects that implement this package.

Is that correct?
Thank you!

@audrism
Copy link
Collaborator

audrism commented May 12, 2022

Note that dependency is language specific, it could be file/package, etc. Same with Def, so match needs to also be language-specific.

Also, only c++ has namspaces, but plain c does not. Hence c does not have entries in Def.

@fhbzc
Copy link
Member Author

fhbzc commented May 12, 2022

Hi Audris:
I'm trying to interpret what you mean. Are you saying it's correct that we just remove the postfix of a file and get a package name?

@audrism
Copy link
Collaborator

audrism commented May 12, 2022

a) What needs to be done depends on the language.
b) plain c does not even have Def, so there is no way to get repos where packages are implemented

@fhbzc
Copy link
Member Author

fhbzc commented May 13, 2022

Got it.
I try to collect the dependency data for a list of projects (tens of thousands). It seems lookup/getValue doesn't work for c2PtAbflPkgFullUX and I can only do zcat for each project, which makes the script very slow. Is there a way to accelerate this?

@audrism
Copy link
Collaborator

audrism commented May 13, 2022

See my previous comment

>> I think these are what you need
>> da5:/data/play/releases/{Def,Pkg}2PFullU.s are what you need.

@fhbzc
Copy link
Member Author

fhbzc commented May 13, 2022

Hi Audris:
Thank you for the prompt response.
I checked the file you send, it gives me something like
Cs:!!!!!!!;guyanderson_BestRestaurants
I assume that means project "guyanderson_BestRestaurants" depends on library "Cs:!!!!!!!", right? So I still need to find the commits implement this library "Cs:!!!!!!!", which seems only possible with zcat | grep?

Also I indeed need the earliest time of dependency (I want to know when does the dependency is built), is there a way I can get that?

@audrism
Copy link
Collaborator

audrism commented May 13, 2022

Right, Cs stands for c#. Why exactly do you need commits?

My suggestions is to start with getting all dependencies properly then worry about the first time it is introduced.
In any case, the latter is simply a trivial filter on c2PtAbflPkgFullUX.s that updates time of the dependency if an earlier is found.

@fhbzc
Copy link
Member Author

fhbzc commented May 14, 2022

Hi Audris:
The reason I need commit is because I need the dependency between repos, not repo and libraries.
Can you say a bit more about "a trivial filter on c2PtAbflPkgFullUX.s "? Because I need to have the exact timestamp of earliest commit so that I can compare it with other events, like a tweet mention

@audrism
Copy link
Collaborator

audrism commented May 14, 2022

  1. trivial filter
    c2PtAbflPkgFullUX.s has t in it, that is time and P is project and Pkg is module.
    I presume you would need to update time (if you find earlier) for each P,Pkg tuple while reading all these files?
    It requires a single pass only, no lookups, hence trivial

  2. a harder part is to match Def and Pkg properly in da5:/data/play/releases/{Def,Pkg}2PFullU.s (the same Def and Pkg as in c2PtAbflPkgFullUX.s c2PtAbflDefFullUX.s

@loconous
Copy link
Collaborator

@fhbzc Could you please verify if this stands as a current issue? If not, please resolve at your convenience.
Thanks!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants