Forums : Technical Issue Help

Dear Open Hub Users,

We’re excited to announce that we will be moving the Open Hub Forum to https://community.synopsys.com/s/black-duck-open-hub. Beginning immediately, users can head over, register, get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.


On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at [email protected]

Wrong "Visual Basic" detected.

Hi. Today I added the project repository for the project Fresh IDE and the spider detected more than 4000 lines of Visual Basic. But the project is written in plain assembly language (FASM) and there is no VB at all.

Another issue is the CSS - detected as 3000 lines, but in the project there are at most 500 lines (counting the white space) CSS in the documentation.

I think that the spider wrong classifies some of the FASM macro lines as written in VB or CSS.

What can be done in order to fix this issue?

Regards

johnfound almost 10 years ago
 

After some file ignored (because imported from other projects), the CSS lines seems to be OK now. But the Visual Basic still is not. There is no VB of any kind in the project.

johnfound almost 10 years ago
 

John,

Will take a look, but be advised, we often find that the mode of the editor used as seen in the files in the comment data will fool the source recognition engine.

Standby.

ssnow-blackduck almost 10 years ago
 

John,

It all looks straightforward to me but I should refer this to management for a closer look. Please feel free to check out our open-source analysis code at: https://github.com/blackducksw/ohcount and perhaps you'll see the problem and we accept pull requests gladly. This will speed the fix a great deal. The code is written mostly in Ruby but I seem to remember it's well commented.

Thanks!

ssnow-blackduck almost 10 years ago
 

Is it possible to see somewhere what files are detected as Visual Basic?

johnfound almost 10 years ago
 

John,

The only way I can think of would be to run the code in that GitHub project against a local copy of the repository. I suspect that's what the technical guru will do...

I also notice that no code seems to be apparent on the Ohloh Code page yet but it's so new that it may not have been fully processed yet. It uses a completely different method of recognition so it might be revealing. Truth is, we haven't had a big focus on Assembler languages so it's possible we need to sharpen our recognition skills on both sides.

Let me know if you find anything and I'll do the same...

Thanks!

ssnow-blackduck almost 10 years ago
 

Well, I am very bad C/C++ programmer and I have an allergy to regular expressions, so probably will not suggest solutions, but there are the problems I found on first glance:

  1. The VB detected files are with extension frm, which I use to name the visually created (assembly language) forms. It is too common abbreviation (of form) to be used only in VB. IMHO.

  2. One unexpected problem detected - the files with extension inc are very common in assembly programming (also Delphi/Pascal and probably in many other languages).
    So, the inc files in my project are detected as NULL, because the only two cases ohcount checks are binary and php, and fails.

johnfound almost 10 years ago
 

BTW, after some analysis, I think that the problem is generally non solvable. The only proper solution is to allow the users to provide extension-language lists for every project that to override the defaults. This way, the auto detector will cover the regular and most common cases, and the private lists will handle only the exceptions.

johnfound almost 10 years ago
 

John,

I would agree with you if we only operated on the basis of extensions but we also implement other forms of analysis to determine the type of file being analyzed. It can get complex and is not without its faults but we've had some success with identifying what language is represented even within commonly used extensions. Management is monitoring this exchange and thinking about the issue and hope to have a solution though it will take some time since man-hours are allocated pretty fully just now.

Thanks!

ssnow-blackduck almost 10 years ago
 

John, you've hit the nail on the head.

Ohcount first uses known file extension mappings to identify languages. One can see in the Ohcount source that the frm extension is mapped to Visual Basic.

To change this behavior, a new disambiguation method would need to be created in the detector.c file.

The same is true for the inc files. One can see in detector.c that the disambiguate_inc indeed detects only binary or PHP files.

Might you be interested in submitting a pull request with an improvement?

Peter Degen-Por... almost 10 years ago
 

Might you be interested in submitting a pull request with an improvement?

Well, as I already said, I am not C/C++ programmer I can only barely read C/C++. The above findings are based on compilation of the code and little reading of the source. I definitely can't make the needed changes. :(

Of course I can write this code in assembly language, but I am not sure Ohloh will accept such an improvements.

johnfound almost 10 years ago
 

As I said, my C/C++ skills are very low, but I managed to change the counter in order to force it to detect .INC and .FRM files as an assembly language.

I did it not as a solution (it is not) but as a way to estimate the error the wrong logic caused to assembly projects. Here are the statistics:

Proper counting (.inc and .frm files, forced to assembler):
Language Files Code Comment Comment % Blank Total
---------------- ----- --------- --------- --------- --------- ---------
assembler 465 88955 13243 13.0% 25110 127308 93.1%
html 5 7602 0 0.0% 1182 8784 6.4%
css 4 451 16 3.4% 118 585 0.4%
shell 4 43 4 8.5% 6 53 0.0%
bat 4 19 0 0.0% 0 19 0.0%
xml 2 19 0 0.0% 2 21 0.0%
sql 1 7 0 0.0% 0 7 0.0%
---------------- ----- --------- --------- --------- --------- ---------
Total 485 97096 13263 12.0% 26418 136777

Wrong counting (from the ohloh site):
Language Code Comment Comment % Blank Total
---------------- ----- --------- --------- --------- --------- ---------
Assembly 62119 10692 14.7% 20551 93362 86.0%
HTML 7602 0 0.0% 1182 8784 8.1%
Visual Basic 4265 0 0.0% 1434 5699 5.3%
CSS 451 16 3.4% 118 585 0.5%
shell 47 5 9.6% 6 58 0.1%
bat 19 0 0.0% 0 19 0.0%
XML 19 0 0.0% 2 21 0.0%
SQL 7 0 0.0% 0 7 0.0%
---------------- ----- --------- --------- --------- --------- ---------
Totals 74529 10713 23293 108535

johnfound almost 10 years ago