Forums : The Ohcount Project

Dear Open Hub Users,

We’re excited to announce that we will be moving the Open Hub Forum to https://community.synopsys.com/s/black-duck-open-hub. Beginning immediately, users can head over, register, get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.


On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at [email protected]

diff support?

Would it be possible to add diff counting support to ohcount (and ohloh)?

It would be really nice if ohcount could detect languages used inside diff files and count them like plain code.

Michał Górny over 15 years ago
 

Diff files are not source code files. It doesn't make sense to count them as such. Care to elaborate?

Anonymous Coward over 15 years ago
 

Hi Michał,

We have come across a few projects on Ohloh where the source control system didn't contain code, but rather contained diff files. Discovering this was sort of a mind-bending source-control-inside-of-source-control moment. It implies that the source control database itself contains diff files of diff files!

It's extremely hard to explain simply the levels-within-levels problem this causes for us.

From a simple perspective, I can agree that when a new diff file appears that, say, adds 8 new lines of C code, a reasonable interpretation is that this commit should be credited with 8 new lines of C.

However this is not at all how Ohloh works internally. We assume that we are given a series of snapshots, and then we calculate deltas from those. Those deltas are what actually get stored in our database to build our reports.

If the source control system contains diffs instead of source code, then taking those deltas is a mistake. What we should do in that case is parse the deltas right out of the diffs, and store those directly in our database. But that's a complete disruption to our current methodology, and we'd have to write a lot of brand-new code for that.

Also, continuity isn't guaranteed at all. Nothing says that the diff files in the source control form a coherent trail. They might refer to unrelated files, or they might not join together properly. Two different diff files might contain directly contradictory instructions. There's no way that Ohloh could guarantee anything about that data.

Not to say that this is all theoretically impossible, it's just not easy.

On a much more simple level: Yes, I think you could modify Ohcount to look at a single diff in isolation and have it declare this diff file contains instructions to add 8 new lines of C code. That in itself will have some small challenges, but it might make a fun coding project.

But it's all one level too deep -- our tools expect snapshots, not instructions on how to change non-existent snapshots.

Whew.

Robin Luckey over 15 years ago