Forums : Ohloh General Discussion

ohloh's metrics are (very) bad?

there was a post called "criticism" 4 monthts ago, saying olohs metrics are not very good ... or no metrics at all. so i tried to understand the metrics a little better ... and failed:

  1. maven oloh says "Over the entire history of the project, 38 contributors have submitted code. 32 have done so in the last year." it also shows the history of the project to be 3 yrs. and it says "123 Person Years". this would mean everybody committing was full time?

  2. eclipse oloh guesses "2004 Person Years", and it says "6 yrs old". 2000/6 = 330 persons, working fulltime? But oloh says also "Over the entire history of the project, 182 contributors have submitted code. 78 have done so in the last year.",

  3. gcc oloh guesses "847 person years". Over the entire history of the project, 335 contributors have submitted code. 149 have done so in the last year.

  4. mediawiki oloh guesses 206 person years. oloh can also do committer cloning, brion vibber is "brion" and "vibber" :)

so i'm wondering: * is a highly complex, difficult to write compiler grown in 12 yrs in approximately the same league as a fairly simple php script like mediawiki? * how do the eclipse numbers fit in the whole picture?

btw, what do you think to put your own source code online, and also measure it?

44ae5fe9de9ba11fe1b443fd7eb034b6?&s=42&rating=pg&d=http%3a%2f%2fopenhub.net%2fanon80
ThurnerRupert over 8 years ago
 

hi ThurnerRupert,

Thanks for bringing this up! You're totally right: the cost calculator seems pretty wacky. What you see in the cost calculator is the COCOMO model's answer. It works off of the total LOC in a project and determines how long it would take to produce them from scratch. We multiply this time by a salary (55k/year by default) and come up with the estimated project cost.

Short answer: COCOMO's an industry standard that can produce inaccurate results. We provide it as a service since we've had many requests for it. We would like to improve or replace COCOMO with something more accurate someday.

Long answer:

The COCOMO discrepancies you mention can arise from valid scenarios:

  1. A project kicks off their project by commiting the codebase of the previous version in their new source control system. Ohloh will see this as short activity (just one commit!) - however COCOMO will tally up the LOC and determine that it would have taken (possibly) many years to write this from scratch.

  2. A project restricts commit access to their source control system. As a result, a single developer reviews and commits a community's submitted patches. Again - the source control system will show only one active person, yet the total amount of person-years going into the project is much greater.

  3. An "open source" company scrubs their public source control system by dumping all of their work into the source control system at a regular interval (ie: once a month). The source control activity won't properly show exactly how much work went on behind the scenes.

There are more subtle variants of these cases too. So what to make of the calculator? At this point the calculator is mostly a way for lay-people to get a better idea of the magnitude of a project -- since they can't tell if 1.2 million lines of C/C++ is a lot or not. Reading much more into it at this point is moot. We have a huge list of improvements in this space we need to address. As a slight tangent, I think that, in the future, the most accurate answer will likely yield from a blend of COCOMO and the source control system's history. I'd love to engage anyone in a discussion on that. I invite anyone to ping me if you're interested (jason@ohloh.net or in these forums).

is a highly complex, difficult to write compiler grown in 12 yrs in approximately the same league as a fairly simple php script like mediawiki?

To be fair, the COCOMO model has many input variables to try to account of system/language/dependency complexities. We don't use them - we treat a line of C with the same value as a line of PHP or Shell Script. We frankly have bigger fish to fry than to try to tweak the formula that way. For example, enabling projects to filter directories as "documentation" or "external libs" would likely provide much better results than worrying about COCOMO exponents.

how do the eclipse numbers fit in the whole picture?

I'm not sure what you mean. Could you elaborate?

btw, what do you think to put your own source code online, and also measure it?

We're a startup still figuring out our business model. We'd love to be open source but we still need to evaluate the pros and cons. Btw - we actually do measure ourselves internally on our staging servers ;-).

9dbaca493199c57710e53b56310f659d?&s=42&rating=pg&d=http%3a%2f%2fopenhub.net%2fanon80
Jason Allen over 8 years ago
 

Hi Jason,

These are good answers, thanks for taking the time to spell it out. I hadn't realised until a friend pointed out that you were using COCOMO. He actually compared your stats with another stats package that implements COCOMO and found the answers to be almost identical. That was reassuring for me.

Also I just did a presentation to a government crowd about all the projects in my stack. Having a LOC number was crucial to establishing the serious nature of these projects. And since I could tell them it used COCOMO method, many folks instantly knew how much weight to put in the stats. So, thanks for using a a known process instead of rolling your own. An imperfect but understood answer via COCOMO is better than one that I can't defend or explain.

77cac7b8de91d8e79d56c031cd41783c?&s=42&rating=pg&d=http%3a%2f%2fopenhub.net%2fanon80
Tyler Mitchell over 8 years ago
 

jason, thanks for the reply!

stating mediawiki as 210 person years when it is more likely in a 5-10 person yrs league is so way off that i do not think you have a bigger fish to fry :)

do you expect that the effort goes up if you put php measurement into it?

and the discrepancy between the people who worked for eclipse and you guessed how many should have worked is also way out. but here i cannot judge the actual effort behind it.

44ae5fe9de9ba11fe1b443fd7eb034b6?&s=42&rating=pg&d=http%3a%2f%2fopenhub.net%2fanon80
ThurnerRupert over 8 years ago
 

Post a Response