-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Use native dot product in Lucene #15508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Sharing some JMH benchmark results from Graviton 2 and Graviton 3 with this PR
Graviton 2 Graviton 3 Apple M2 Pro Note : |
|
I do have some minor comments but the large one is: how do we handle native code for end-users? Do we ship multiple precompiled binaries? Do we ship none? The current build modifications works "for you" but to compile for a matrix of all the possibilities is a bit of a nightmare. |
dweiss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. The integration with native libs here is... tight. And to get a working version, you pretty much have to recompile from scratch on the target machine.
It would be more elegant to have it somehow wrapped in a service and a separate (optional) module. I'm not sure if it's possible.
Also - as I mentioned - some thought needs to be given to what the public artifacts are going to be - are binaries going to be shipped, for which cpus, how are they going to be compiled and what dev requirements this entails (installing cross-compilation environments is probably not going to fly with many).
build.gradle
Outdated
| plugins { | ||
| id "base" | ||
| id "lucene.root-project.setup" | ||
| id "c" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be moved to the project which actually uses the plugin (misc?).
lucene/core/build.gradle
Outdated
| test { | ||
| dependsOn ':lucene:misc:buildNative' | ||
| systemProperty( | ||
| "java.library.path", | ||
| project(":lucene:misc").layout.buildDirectory.get().asFile.absolutePath + "/libs/dotProduct/shared" | ||
| ) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these changes should be pulled into a single gradle java plugin that handles the configuration across all the involved projects. If native libs are not enabled, these changes shouldn't be applied at all. A single plugin will make it easier to see the set of changes applied globally; currently they're scattered around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense
|
For the record, any Java library that ships with native libs has similar integration problems. You can look at jansi or any other lib that has native components for inspiration (and to get an idea) how hairy it becomes. Here's jansi's docker-based cross-compilation build. |
|
Thanks @dweiss for taking a look. I completely agree with maintaining different native bindings in Lucene can get really challenging as you mentioned and I propose we don't ship don't ship any pre-built binaries (or to say at least in this PR today if someone feels otherwise). I got some cross environment binaries working but I agree we might not be able to support all permutations/combinations of the different environments in the best possible way. While I got some cross-platform binaries working for few selective environments, supporting all environment permutations will be really painful. I added the build configuration to test across different environments on my end. I'm more happy to remove the Gradle build configurations for binary generation. One important point here is that GCC's auto-vectorization of simple C code performs comparably to (and sometimes better than) hand-written NEON/SVE implementations(except some cases/runs for SVE). This allows us to keep things simple on Lucene's end and let users experiment. This is inline with what @rmuir proposed in this comment. So here is what I propose we can do and would look like for user :
Contract: This PR would enable the interface to interact with binaries for dot product computations but providing that binary is in user's court. Users need to provide the binary with the required signature/impl:
Let me know if this makes it clear for what it would look like from Lucene user's perspective. |
|
To expand on what exactly a Lucene user would look like on their end to test/enable this :
// Linux/Unix
gcc -shared -O3 -march=native -funroll-loops -o /home/simd/libdotProduct.so /home/apachelucene/lucene/misc/src/c/dotProduct.c
// macOS
gcc -shared -O3 -march=native -funroll-loops -o /home/simd/libdotProduct.dylib /home/apachelucene/lucene/misc/src/c/dotProduct.c
// Windows
gcc -shared -O3 -march=native -funroll-loops -o /home/simd/dotProduct.dll /home/apachelucene/lucene/misc/src/c/dotProduct.c// dotProduct.c
int32_t dot8s(int8_t vec1[], int8_t vec2[], int32_t limit) {
// dot product impl
}
./gradlew test // PASSES, since by default native dot product is not testes
./gradlew test -Ptest.native.dotProduct=false // PASSES, switched off
./gradlew test \
-Ptest.native.dotProduct=true \
-Ptests.jvmargs="-Djava.library.path=/home/simd" // PASSES
./gradlew test -Ptest.native.dotProduct=true // FAILS, needs library to be linked
java --enable-native-access=ALL-UNNAMED \
--enable-preview \
-Djava.library.path="/home/simd" \
-jar lucene/benchmark-jmh/build/benchmarks/lucene-benchmark-jmh-11.0.0-SNAPSHOT.jar \
regexp "binaryDotProductVector|dot8sNative" |
|
I get the performance improvement is nice but it'll be difficult to maintain and test properly the way it's currently implemented. Not to mention most folks out there use Lucene without the possibility to recompile from sources.... I don't mind adding a native implementation but maybe this should be integrated via more standard mechanism (like service lookup) and then everything would live in that optional "native" module? This would open up the possibility to compile such a module independently. Sorry for my lack of enthusiasm about this but I can tell it'll be a problem to maintain even by looking at it (properties, etc.). |
i'd use zig |
|
I looked again and I still think a nicer way to plug this native impl. into Lucene would be to make another implementation of VectorUtilSupport ("NativeVectorUtilSupport") and then use it instead of the default Panama implementation, if it is available. If you used service lookup for picking the implementation, you could select between them when classes are initialized. Then the native implementation could go into its own module and there'd be no need for hacks like system properties, internals opened for benchmarks, etc. I may be missing something of course, but I think it'd be a much easier way forward. What do you think? |
|
@dweiss I tried to make it more pluggable and added the I removed all the gradle related changes and not generate any binary at all for now. We can add support for more functions as we go but I think this is a good start?. There is no requirement to have a system property, though users could still pass Note : I kept the C code (as is in misc) even though we are not building since it could be used as a reference for users or provides a good starting point.
|
Very Interesting! I'd love to explore this path and would be good in general if we could have some reliable way to provide efficient binaries to users. Maybe we could pursue this as a followup and keep this change just focussed on adding support to use the provided binary (next we could try adding actual binary support)? |
|
It's not what I had in mind, sorry for being vague. I have a strong feeling that the entire thing should be implemented using standard ServiceLoader mechanisms. So the default singleton in VectorizationProvider.Holder.INSTANCE should be instantiated using a service and the logic of this method - should be moved to individual service implementations (verification if a particular implementation can be used in the current environment, along with the instantiation of that implementation). The "singleton lookup" should only load all service providers, filter out what cannot be used for whatever reason and then pick one of the remaining candidates in their preferred order (which can be controlled by a system property, for example lucene.vectorization-provider=*,panama,default would indicate the desired ordering among available implementations). So, by default we'd have the "default" (DefaultVectorizationProvider) fallback and "panama" (PanamaVectorizationProvider) but you could add another service implementation (native). The service would need to implement two methods - one to check if it can be used and the other to provide an instance of VectorizedProvider. It isn't a straightforward patch because a lot of the code is currently package private and intentionally hidden (and won't allow subclassing). But I have a gut feeling it's possible and it would be a lot more elegant in the long run. It's also related to PRs like this one - #15294 which would like to "know" which service implementation is being used. This could be the name (or class) of the service provider, for example. I'm unfortunately away this week and won't be able to contribute directly. I would hold this patch until the above avenue can be explored though (either turns out to indeed work out nicely or won't work, for some odd reason). |
|
This is a quick and dirty PoC that I quickly wrote to better show what I mean - https://github.com/apache/lucene/compare/main...dweiss:lucene:vector-prov-service?expand=1 it doesn't fully work (there are some clashes between service loader and the module system, plus a ton of cleanups to be made) but if you take a look at VectorizationProvider class it'll give you an idea what I think those losely-pluggable implementations should be loaded like. Now... it's just an idea - I'm not really heavily advocating to switch to it but I think it'd be better in the long term if we somehow decoupled those multiple implementations (and this includes potentially removing the need for multi-release jars, which are difficult to work with). |
Yes, there's no need to create binaries here. Was just a suggestion to avoid some hellacious network of Dockerfiles: for this kind of purpose, zig can be used as a better c compiler that "just works" to cross-compile all those binaries. |
|
Thank you @dweiss for sharing your approach! This indeed is more decoupled and looks better to have separate providers convey themselves if it could be used or not based on the dynamic priority order. I've implemented your patch with an additional service implementation for the Native use case. Currently, this delegates to Panama as the fastest fallback, so the native provider is only available when Panama is usable and the required binary is present. |
|
Please give me some time - I've been away on a winter break. |
Description
This is continuation of the work @goankur started in #13572. Quoting the comment below from the initial PR.
To make use of faster SIMD instructions,
PanamaVectorUtilSupport#{dotProduct|uint8DotProduct}will switch to native dot product implementation whenorg.apache.lucene.util.Constants#NATIVE_DOT_PRODUCT_ENABLEDistrue(i.e. system property-Dlucene.useNativeDotProduct=trueis passed).GCC version : >= 10