Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

do not attempt to retrieve history using the repository method for files without history cache entry for repos with history for directories #759

Open
vladak opened this issue Feb 21, 2014 · 4 comments
Assignees

Comments

@vladak
Copy link
Member

vladak commented Feb 21, 2014

I was playing with indexing on OS X (for #687) with one of my repos and discovered that untracked files cause unhandled exceptions in the phase of generating Lucene index (from addFile()). It would be nice to catch these exceptions and generate log message with WARNING or maybe INFO level.

@vladak vladak added the bug label Feb 21, 2014
@vladak vladak self-assigned this Feb 21, 2014
@tarzanek
Copy link
Contributor

What does luke say? Were those files added?
If yes, then they were added without history which is desired behaviour - and the stack goes into fine level i think ....

@vladak
Copy link
Member Author

vladak commented Feb 24, 2014

The actual output looks like this:

11:44:51 WARNING: Non-zero exit status 255 from command [/usr/bin/hg, log, -f, --template, changeset: {rev}:{node|short}\n{branches}{tags}{parents}\nuser: {author}\ndate: {date|isodate}\nfiles: {files}\ndescription: {desc|strip|obfuscate}\n, race/test.dat] in directory /Users/vladimirkotal: abort: cannot follow file not in parent revision: "race/test.dat"
11:44:51 WARNING: An error occurred while reading history: 
org.opensolaris.opengrok.history.HistoryException: Failed to get history for: "/private/var/tmp/base/src/unix-prog-src/race/test.dat" Exit code: 255
    at org.opensolaris.opengrok.history.MercurialHistoryParser.parse(MercurialHistoryParser.java:79)
    at org.opensolaris.opengrok.history.MercurialRepository.getHistory(MercurialRepository.java:501)
    at org.opensolaris.opengrok.history.MercurialRepository.getHistory(MercurialRepository.java:494)
    at org.opensolaris.opengrok.history.FileHistoryCache.get(FileHistoryCache.java:307)
    at org.opensolaris.opengrok.history.HistoryGuru.getHistory(HistoryGuru.java:212)
    at org.opensolaris.opengrok.history.HistoryGuru.getHistoryReader(HistoryGuru.java:175)
    at org.opensolaris.opengrok.analysis.AnalyzerGuru.populateDocument(AnalyzerGuru.java:284)
    at org.opensolaris.opengrok.index.IndexDatabase.addFile(IndexDatabase.java:591)
    at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:842)
    at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:812)
    at org.opensolaris.opengrok.index.IndexDatabase.update(IndexDatabase.java:379)
    at org.opensolaris.opengrok.index.IndexDatabase$1.run(IndexDatabase.java:168)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

I am not sure why there has to be the exception stack trace at all (for this specific case).

@vladak vladak added the indexer label Mar 22, 2024
@vladak
Copy link
Member Author

vladak commented Mar 25, 2024

With the recent Mercurial (6.1.1) the hg log no longer exits with non-zero code when the file is untracked, even with -f/--follow (which current OpenGrok version does not use unless renamed support is on for given repository) so the exception is no longer reported.

That said, this is a question of efficiency. The hg log command is executed which will slow the indexer down. For such repository types, the fallback should be false when calling getHistory() to avoid it because when -H is used the history cache should be generated for such repositories (i.e. these with the capability to retrieve history for directories) in the 1st phase of indexing.

For repositories with hasHistoryForDirectories() returning false (e.g. SCCS), this has to still work because for these the history will be only stored in the 2nd phase of the indexing via the populateDocumentHistory().

@vladak vladak changed the title unhandled exceptions when indexing repo with files untracked by SCM do not attempt to retrieve history using the repository method for files without history cache entry for repos with history for directories Mar 25, 2024
@vladak
Copy link
Member Author

vladak commented Mar 25, 2024

Actually, the check is already being done in HistoryGuru#getHistoryFromRepository():

/*
* Some mirrors of repositories which are capable of fetching history
* for directories may contain lots of files untracked by given SCM.
* For these it would be waste of time to get their history
* since the history of all files in this repository should have been
* fetched in the first phase of indexing.
*/
if (env.isIndexer() && isHistoryIndexDone() &&
repository.isHistoryEnabled() && repository.hasHistoryForDirectories() &&
!env.isFetchHistoryWhenNotInCache()) {
LOGGER.fine(() -> String.format("not getting the history for '%s' in repository %s as the it supports "
+ "history for directories", launderLog(file.toString()), repository));
return null;
}

The issue is that the fetchHistoryWhenNotInCache configuration tunable is set to true by default so the history is retrieved also for repositories which support history per directory. Maybe the tunable should be set to false by default and that is why I am keeping this issue open.

That said, the tunable might go away as suggested in #1498 (comment)

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants