Forums : Technical Issue Help

Dear Open Hub Users,

We’re excited to announce that we will be moving the Open Hub Forum to https://community.synopsys.com/s/black-duck-open-hub. Beginning immediately, users can head over, register, get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.


On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at [email protected]

Code analysis of "GNU Autoconf Archive" project is wrong

The code analysis of the GNU Autoconf Archive says that 46% of the code is written in Python. However, that cannot possibly be true. The vast majority of source code in that project is written for Autoconf, i.e. it resides in m4/*. Is it possible that the statistics gatherer doesn't recognize these files correctly?

Peter Simons about 14 years ago
 

Yes, Ohloh does not recognize *.m4 files at all, and they are completely ignored by our line counter.

This may be a simple fix -- we have an autoconf parser, but it only processes configure, configure.in, and configure.ac files. I'm not an autoconf user. Would the fix be as simple as passing all *.m4 files to the same parser we use for configure.* files?

Robin Luckey about 14 years ago
 

Hi Robin,

*.m4 files that are inputs for Autoconf can be parsed with the same parser as the configure.ac scripts.

Now, the vast majorify of *.m4 files will be directed at Autoconf -- especially in a free software environment. There are other uses, though, that the Autoconf parser would not be able to recognize. M4 is just a general-purpose text pre-processor, like cpp. Sendmail, for example, uses m4 to generate its configuration file, which is totally unrelated to Autoconf. So does the SELinux kernel.

As a rule of thumb, it's probably safe to assume that an *.m4 file is related to Autoconf if it contains the string AC_DEFUN, which is the command used to define an Autoconf macro. There is also the file aclocal.m4, which is commonly used by the Autoconf and Automake package. Oftentimes, Autoconf macros reside in a directory called m4 or build-aux, which might a clue, too.

I hope this helps!

Peter Simons about 14 years ago
 

Hi Robin,

is there any chance that the line counter could be changed to honor *.m4 files? Even a trivial solution would be better than the current state where those files are completely ignored, IHMO. The statistics for the autoconf archive are really way off. :-(

Take care,
Peter

Peter Simons about 14 years ago
 

Hi Peter,

Thanks for all of the information -- I think it is enough to implement a decent m4 parser.

I do think this is a relatively simple improvement, but I can't promise when we'll have time to open up the Ohcount code again.

I've created a ticket for this change. If you are really eager to see the change, and want to take a crack at it yourself, you can find the source code here.

Thanks,
Robin

Robin Luckey about 14 years ago
 

Hi Robin,

I've attached a patch to the Trac ticket. It worked fine for me, and the tests suite still succeeds, too. I hope it's useful.

Take care,
Peter

Peter Simons about 14 years ago
 

Hi,

is there any feedback regarding my patch? Would you like to apply it? Or is there something wrong with it?

Take care,
Peter

Peter Simons about 14 years ago
 

Hi Peter,

Sorry for the lack of response. Yes, I do play to apply your patch, even though I am usually a real stickler about patches that don't include tests :-).

I can't promise how soon we will deploy a new version of the line counter. We've got quite a backlog of patches building up, so perhaps we'll find some time to catch up on it this week.

Thanks for having the interest and motivation to look into this.

Robin

Robin Luckey about 14 years ago
 

I went through the backlog of Ohcount patches today, and the autoconf change is now live on our servers.

I ran a clean recount of the GNU Autoconf Archive, and you can see the updated results online now. There's a great deal of autoconf code detected.

It seems, however, that our parser does not recognize # as a comment prefix, so our parser found 120K lines of autoconf, but hardly any comments. I don't think anyone has carefully examined the results of the autoconf parser before, so I'm not surprised this bug has survived for so long.

Robin Luckey about 14 years ago
 

Hi Robin,

thank you very much for updating the installation; the new results for the Autoconf Archive are much more accurate. Thanks!

I notice that OHLOH doesn't consider Autoconf code when computing project costs. As a result, it believes that the Autoconf Archive requires less than 1 person-year to duplicate from scratch. In fact, however, it took more like 10 person-years to accomplish that. I guess that the algorithm considers only commits to python scripts and Makefiles?

Peter Simons about 14 years ago
 

You're correct, Ohloh disregards Autoconf for purposes of computing project cost. For the vast majority of projects, this is the right decision.

There was a time when we would frequently get the reverse complaint: for a lot of small projects, the volume of generated Autoconf output could dwarf actual project code.

If we can devise with some kind of simple heuristic to drive the decision whether or not to include the Autoconf in the cost calculation, we could probably put something in place to accommodate this project.

Another option could be to simply let the user drive the calculation. You'll also see that the drop-down list in the cost calculator lets you choose Markup and Code, Code Only, and Markup Only. At the risk of creating an explosion of UI options, we could perhaps add an option to include or exclude the build script (i.e., autoconf) contents.

Robin Luckey about 14 years ago
 

Hi Robin,

you are right. It's going to be hard to come up with an automatic scheme to decide whether Autoconf code is supposed to be included in project cost calculations or not. Custom-written Autoconf code can be quite sophisticated and is thus expensive. On the other hand, Gnulib or Automake can generate a fairly large amount of Autoconf code almost effortlessly and thus inexpensively. I wouldn't know of a way to distinguish these two cases. There's probably no silver bullet to answer that question.

Allowing a human to decide -- to configure the calculator -- is probably the way to go.

Take care,
Peter

Peter Simons about 14 years ago