SVNRepository - get properties and filesize efficiently

Hi,
We are maintaining an open-source search/indexing service for Subversion which requests more or less all information from each commit. Moving from svnlook to DAVRepository/HttpV2 is a current initiative.

The framework needs the properties and the filesize before (potentially) getting the content stream. The performance would benefit from getting both properties and filesize in a single request in these 2 situations:

  • Get a single file: info(…) can provide filesize (no props) while getFile() can provide props (no filesize).
  • List the directory contents: getDir(…) can provide filesize of children (no props).

I have verified that DAVUtil.getProperties(…) can get ‘allprop’ in combination with ‘Depth: 1’ from mod_dav_svn 1.13.0.

Can you recommend any other API i SVNRepository/DAVRepository for these 2 situations?

Would you consider adding properties to the SVNDirEntry object?

Thanks,
Thomas Å.

Hello Thomas,
I don’t fully understand your question. I’ll try to rephrase your question how I understand it and ask you to could confirm that I understand you correctly.

As I understand from your words, when using info(…) and getFile(…) there’s basically the same underlying HTTP request and these methods just pick some subset of the information the response returns. In particular, info(…) filters out properties while getFile(…) filters out size. But there could be a method that returns both. And similarly for getDir(…)

By the way, you could be interested in using status(…) for crawling SVN repository:
http://vcs.atspace.co.uk/2013/03/16/fast-listing-of-svn-repository-with-svn-crawler/

With a single query you can get SVN repository structure, i.e. directories, children, and properties with a single query. I don’t remember whether it returns file sizes. If it doesn’t, you can create another SVNRepository connection and query this information in in parallel with crawling the repository.

Hello Dmitry,

Thanks for your quick response. Yes, you have understood my question correctly.

We are using status for other components, however, the indexing is very much focused on the changes of each commit rather than the directory/file hierarchy. I have not been successful with getting filesize using status / diff / replay reports.

The list-report can provide filesize but it seems non-implemented in SVNKit.

Seems a bit difficult to extend the DAVRepository with an additional method. I might be able to wrap it and use some reflection to access openConnection / getConnection / closeConnection.

Then one should check whether native SVN 1.13 API has allows to get size and properties at the same time. If yes, a corresponding method should be added to SVNRepository. If no and if it is specific to DAV protocol, I see no problem with adding such a method to DAVRepository directly. I don’t understand why you think it’s “a bit difficult”.

DAVRepository is beyond public API, adding a method to it wouldn’t break binary or protocol-level compatibility.

Sorry, that was unclear. It is straight forward to add a method directly into DAVRepository if you merge it into SVNKit.

I was looking into extending / wrapping DAVRepository without modifying the SVNKit build. That seems a bit difficult since they are instantiated by SVNRepositoryFactory.

If you’re ok with such “hacky” way, SVNRepositoryFactory is not a problem because if you have a look at DAVRepositoryFactory code, you’ll see it’s pretty straightforward and does little beyond calling the constructor for DAV protocol.

It’s svn+custom_tunnel:// protocol that would make repository factory non-trivial. But for DAV protocol you can create a subclass of DAVRepository and call a constructor.

Probably this way you wouldn’t be able to use DefaultSVNRepositoryPool but I don’t know if you need it.

Otherwise I’m ok to accept patches to DAVRepository class directly as its API is not public anyway.

Thanks Dmitry, that helped clarify how to construct DAVRepository. I now see that I can easily subclass DAVRepository and sort out construction in our injected providers.

If I understand correctly I can replace the factory with:
new DAVRepository(IHTTPConnectionFactory.DEFAULT, location, null);
or in in a subclass:
super(IHTTPConnectionFactory.DEFAULT, location, null);

I thought we were using DefaultSVNRepositoryPool somewhere but seems like that is not an issue either.

Update: the constructor is protected but super(…) should work.