I am aware of the recovery process to excise a corrupt revision from the repository. The problem is that I’ve had to do that several times now, and everyone using the repo subsequently needs to re-check out that repo, which is a major disruption for all involved.
To answer your question, I get the invalid XML response when I update to any revision >= r17247. r17255 just happens to be the last commit in the repo and since svn checks out HEAD by default, etc etc. I’m doing another checkout all the way to 17255 but specifically omitting the directories affected by r17247 and things are checking out just fine.
The repo in question is being used in a very unconventional manner that involves storing a lot of office documents in different directories. I am working to improve the situation but this is the situation I have inherited.
Right now, my current theory is this. The users are checking out their working copies into a OneDrive folder so that they can both commit changes to SVN as well as take advantage of Microsoft’s collaborative features. They are typically using TortoiseSVN to interact with the svn server. I believe that OneDrive is modifying the files after the commit checksum has been calculated, but before TSVN has finished uploading the data.
If this hypothesis is correct, then it will result in a corrupted commit. All of the usual mechanisms that would catch the bad data would not catch this in transit because the corruption didn’t occur mid-stream… it would have occurred before the data was sent.
The only way to catch this would be for SVNKit on the server side to validate the checksum provided by the client, prior to writing the commit to storage.
One of the scm-manager developers was nice enough to go through the SVNKit code for me and verify that SVNKit does not validate the incoming checksum, so this bizarre scenario appears to be a possibility.
The only thing I can’t explain is why the error is specifically an XML error. Even if the files being committed are invalid, one would assume that only the CDATA (I’m assuming CDATA… I have no idea what the XML actually looks like.) is affected and that the XML structure itself should be fine.
I wish I could give you access to the repo. Perhaps if I was to talk to my superiors, we could maybe do a Zoom session and you could examine the repo directly?