Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FORCE_SSE is always active on i386/amd64 #297

Closed
ra1nb0w opened this issue Nov 2, 2020 · 4 comments
Closed

FORCE_SSE is always active on i386/amd64 #297

ra1nb0w opened this issue Nov 2, 2020 · 4 comments

Comments

@ra1nb0w
Copy link
Contributor

ra1nb0w commented Nov 2, 2020

If you use this line, FORCE_SSE is always active and you can't compile direwolf with native cpu flags on i386/amd64 architecture.

The correct way is to set

option(FORCE_SSE "Compile with SSE instruction only" ON)

then, at compile time, you can choose which flags to use or (if all are OFF) to use the auto-discovery feature.

@ra1nb0w ra1nb0w changed the title FORCE_SSE is always active FORCE_SSE is always active on i386/amd64 Nov 2, 2020
@wb2osz
Copy link
Owner

wb2osz commented Nov 8, 2020

First a little history, so we can all understand the original intent before cmake was involved.
When building for an x86 32 bit target, the compiler assumed a really really ancient CPU. I found that using the SSE instruction set made it run much faster. (i.e. using either the -msse or -march=pentium3 gcc option) The newer SSE instructions didn't make a difference so there is no point in using them with this application.
We have to consider the case where someone is producing a binary distribution for others. We don't want to use -march=native because the result might not run properly on an older CPU. We also have the case where someone is building a 32 bit distribution on a 64 bit build machine. The generated code should not depend on the characteristics of the build machine, just the intended target.
So I think the ideal situation would be to make presence of SSE instructions or Pentium 3 the default minimum requirement for an x86 32 bit target. Those building a distribution for others can have a minimum requirement of a Pentium 3. Someone who has something older than a Pentium 3 should have a way to disable that minimum requirement when building for own use.
Of course none of this applies to 64 bit target. The generic 64 bit target implies SSE and many other instructions so nothing special would need to be done.
Am I making sense?

Maybe another way to say it would be...

For an x86 32 bit target, the default would be Pentium 3 architecture. Building for a later 32 bit CPU does not seem to offer any benefit. This would be good for packagers making binary distributions for others. It would be good to have some way to turn off that requirement for someone with a computer from the previous Century. (In all these years I only heard of one such case.)

For an x86_64 target, just assume the generic 64 bit architecture. That would be safe for making binary distributions for others. I have not yet done any experiments to see if -march=native offers any benefit for a 64 bit target.

@ra1nb0w
Copy link
Contributor Author

ra1nb0w commented Nov 13, 2020

Yes, that was the intent. But with the line you are forcing the SSE behaviour and no options can change that fixed assignment. If I remember well, I enabled by default FORCE_SSE through option and if someone would to experiment with other settings can do that.
Therefore you should remove that line and make option(FORCE_SSE to ON. You obtain the same default result (if there is no other assignment elsewhere) but the user can choose which flag to use.

@wb2osz
Copy link
Owner

wb2osz commented Nov 14, 2020

Fixed in dev branch. adebd06

Thanks for all your help!

@wb2osz wb2osz closed this as completed Nov 14, 2020
@ra1nb0w
Copy link
Contributor Author

ra1nb0w commented Nov 14, 2020

if you don't remove this line the change is useless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants