forked from marian-nmt/marian-dev
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM Backend using ruy for fp32 and int8 #79
Merged
Merged
Changes from 69 commits
Commits
Show all changes
70 commits
Select commit
Hold shift + click to select a range
79d7b33
ARM Backend for marian
jerinphilip 2ac7cbc
Fix sentencepiece submodule mixup
93b841b
Merge branch 'browsermt-master' into arm-backend
9674973
[sentencepiece] android cmake additional libs
f3e7818
Remove separately added patch in favour of submodule update
5250b9e
Remove trailing newline in integer_common.h to prettify diff
26d3ba2
Remove trailing newline in ruy_adapter.h
b7969b0
Merge branch 'browsermt-master' into arm-backend
b271b70
In-place multiply without malloc by reinterpret_cast
jerinphilip efa5a85
Documentation for the stdcpp/NEON paths created
jerinphilip 179f239
Remove templated abort transpose()
jerinphilip 0d189c8
Reinterpret at unquantize add bias as well as int32_t from float32_t
jerinphilip 8951261
Remove AlignedVector from ruy_adapter - not required here.
jerinphilip 49beb50
Remove ViaRuy::PrepareBias without effect to output
jerinphilip 3cf85f7
If SSE4.1 found use it to avoid perf regressions even if not -march=n…
jerinphilip 4edc8ef
Deduplicate multiply by capturing variability through callbacks
jerinphilip a414b60
Revert "If SSE4.1 found use it to avoid perf regressions even if not …
jerinphilip e2069bf
CMAKE_SYSTEM_PROCESSOR indicates x86 and native mode is not enabled, …
jerinphilip 1b4049a
Remove comments, now that callback is working
jerinphilip e522e6c
Minimal gemmRuy
jerinphilip 90858a5
Update CI
jerinphilip 557de0c
Using simd_utils instead of SIMDE
jerinphilip d10009f
Style fixes: UnquantizeAndWrite, UnquantizeAddBiasAndWrite
jerinphilip 3a37966
const for () operator overrides
jerinphilip 418a7ce
Explicit for single argument constructor: UnquantizeAndWrite
jerinphilip b7412c3
Fix typo
jerinphilip 071e0d4
Low compute path for special case alpha = 1.0
jerinphilip ec886bd
Remove clang only pragmas
jerinphilip 4df1998
Remove leftover bias cycles comment
jerinphilip 1defce6
Merge branch 'master' into arm-backend
jerinphilip c4be980
Defaults: intgemm for x86_64 and ruy and simd_utils for arm
jerinphilip b181847
Revert "Defaults: intgemm for x86_64 and ruy and simd_utils for arm"
jerinphilip 6e4c561
Target architecture detection for ARM
jerinphilip 53636cf
Remove DEBUG statements
jerinphilip be9e153
Remove IntBase inheritance; PrepareB still unimplemented
jerinphilip 876a915
Remove obsolete comment
jerinphilip d399a35
Remove logging statement in hotpath
jerinphilip 06b6dd9
Use CMAKE_CXX_FLAGS instead of add_definitions 🤦
jerinphilip e310f73
Check: Does add_compile_{defs,opts} propogate up?
jerinphilip 3bf1133
Fix typo: definitions
jerinphilip 5c8b1d2
Undo edit attempts manually for min-diff; Using compile_definitions now
jerinphilip 9dd1eff
Restore CMakeDependentOption; Rename only to ONNX_SGEMM
jerinphilip b055c11
Backtrack attempt to flatten ONNX_SGEMM out
jerinphilip d006196
USE_ONNX_SGEMM is a CMakeDependentOption
jerinphilip 39b7237
Keep pre armv8 TargetArch detect unchanged
jerinphilip 3a6c515
Simple ARM detection to no-op out shifted/shiftedAll paths
jerinphilip 46db01b
Add logging statements to indicate forced gemm-path change at constru…
jerinphilip 63fea9a
Remove run script
jerinphilip 82a15e1
Removing kStandardCpp - may add later for tests separately
jerinphilip 4a8c0da
Remove leftover gcc diagnostic pop for SIMDE
jerinphilip e17a5dd
Remove simde-no-tests reference in CMakeLists.txt file
jerinphilip 3c8a149
Remove obsolete comments
jerinphilip 800402c
Explain copying x86-SSE structure for NEON
jerinphilip 9d648d0
Remove executable upload for android
jerinphilip d19a312
Remove comment
jerinphilip 1b38e01
Restore -Werror
jerinphilip 9027ea4
Switch to a {{0}} sigaction on WASM, {0} for rest
jerinphilip a0ee527
Revert "Restore -Werror"
jerinphilip 6285f28
Use -DFMA for NEON from simd_utils example
jerinphilip 8895fda
Remove redundant neon_mathfun include after simd_utils.h
jerinphilip c6c3ac6
Wrap CmakeLists.txt ARM definitions with an if
jerinphilip 3baf620
Use __clang__ instead of WASM_COMPATIBLE_SOURCE; emcc uses LLVM
jerinphilip aa1842c
Suppress warnings by #pragma GCC diagnostic ...
jerinphilip 8eae08b
Re-enable -Werror
jerinphilip 9a541c4
{0} -> {} to work around empty-braces Werror
jerinphilip 4b80399
Replace -Wall with -Wcomment
jerinphilip ac8de91
Revert "Replace -Wall with -Wcomment"
jerinphilip 38b608a
Disable formatting then local edit -Wall -> -Wcomment
jerinphilip 86c8d44
Do not check for BLAS on usual ARM, except Mac: Apple Accelerate
jerinphilip 861e31d
Fix endif: CMakeScript quirks
jerinphilip File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
name: ARM | ||
'on': | ||
push: | ||
branches: | ||
- main | ||
- ci-sandbox | ||
pull_request: | ||
branches: | ||
- '**' | ||
env: | ||
ccache_basedir: ${{ github.workspace }} | ||
ccache_dir: "${{ github.workspace }}/.ccache" | ||
ccache_compilercheck: content | ||
ccache_compress: 'true' | ||
ccache_compresslevel: 9 | ||
ccache_maxsize: 200M | ||
ccache_cmake: -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache | ||
ndk: "${{ github.workspace }}/android-ndk-r23b" | ||
abi: "arm64-v8a" | ||
minsdk_version : 28 | ||
android_platform: 28 | ||
|
||
jobs: | ||
ubuntu: | ||
name: "arm-v8a cross-compile via Android NDK" | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v2 | ||
with: | ||
submodules: recursive | ||
|
||
- name: Install prerequisites | ||
run: | | ||
wget -c --quiet https://dl.google.com/android/repository/android-ndk-r23b-linux.zip | ||
unzip -qq android-ndk-r23b-linux.zip | ||
sudo apt-get -y install ccache cmake | ||
|
||
- name: Generate ccache_vars for ccache based on machine | ||
shell: bash | ||
id: ccache_vars | ||
run: |- | ||
echo "::set-output name=hash::$(echo ${{ env.ccache_compilercheck }})" | ||
echo "::set-output name=timestamp::$(date '+%Y-%m-%dT%H.%M.%S')" | ||
|
||
- name: Cache-op for build-cache through ccache | ||
uses: actions/cache@v2 | ||
with: | ||
path: ${{ env.ccache_dir }} | ||
key: ccache-${{ matrix.identifier }}-${{ steps.ccache_vars.outputs.hash }}-${{ github.ref }}-${{ steps.ccache_vars.outputs.timestamp }} | ||
restore-keys: |- | ||
ccache-${{ matrix.identifier }}-${{ steps.ccache_vars.outputs.hash }}-${{ github.ref }} | ||
ccache-${{ matrix.identifier }}-${{ steps.ccache_vars.outputs.hash }} | ||
ccache-${{ matrix.identifier }} | ||
|
||
- name: ccache environment setup | ||
run: |- | ||
echo "CCACHE_COMPILER_CHECK=${{ env.ccache_compilercheck }}" >> $GITHUB_ENV | ||
echo "CCACHE_BASEDIR=${{ env.ccache_basedir }}" >> $GITHUB_ENV | ||
echo "CCACHE_COMPRESS=${{ env.ccache_compress }}" >> $GITHUB_ENV | ||
echo "CCACHE_COMPRESSLEVEL=${{ env.ccache_compresslevel }}" >> $GITHUB_ENV | ||
echo "CCACHE_DIR=${{ env.ccache_dir }}" >> $GITHUB_ENV | ||
echo "CCACHE_MAXSIZE=${{ env.ccache_maxsize }}" >> $GITHUB_ENV | ||
|
||
- name: ccache prolog | ||
run: |- | ||
ccache -s # Print current cache stats | ||
ccache -z # Zero cache entry | ||
|
||
- name: Generate buildfiles for marian on android via cmake | ||
run: |- | ||
mkdir -p build | ||
cd build | ||
NDK=${{ env.ndk }} | ||
ABI=${{ env.abi }} | ||
MINSDK_VERSION=${{ env.minsdk_version }} | ||
ANDROID_PLATFORM=${{ env.android_platform }} | ||
OTHER_ANDROID_ARGS=( | ||
-DANDROID_ARM_NEON=TRUE | ||
) | ||
OTHER_MARIAN_ARGS=( | ||
-DCOMPILE_CUDA=off | ||
-DCOMPILE_CPU=on | ||
-DCMAKE_HAVE_THREADS_LIBRARY=1 | ||
-DCMAKE_USE_WIN32_THREADS_INIT=0 | ||
-DCMAKE_USE_PTHREADS_INIT=1 | ||
-DTHREADS_PREFER_PTHREAD_FLAG=ON | ||
-DBUILD_ARCH=armv8-a | ||
# -DCOMPILE_WITHOUT_EXCEPTIONS=on # Apparently this can reduce the binary size, let's see. | ||
) | ||
# Additionally list variables finally configured. | ||
cmake -L \ | ||
-DCMAKE_BUILD_TYPE=Release \ | ||
-DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake \ | ||
-DANDROID_TOOLCHAIN=clang \ | ||
-DANDROID_ABI=$ABI \ | ||
-DANDROID_PLATFORM=$ANDROID_PLATFORM \ | ||
-DANDROID_NATIVE_API_LEVEL=$MINSDKVERSION \ | ||
-DANDROID_TOOLCHAIN_NAME=arm-linux-androideabi-4.8 \ | ||
-DANDROID_STL=c++_static \ | ||
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache \ | ||
"${OTHER_ANDROID_ARGS[@]}" "${OTHER_MARIAN_ARGS[@]}" \ | ||
.. | ||
|
||
|
||
- name : Build marian for android | ||
working-directory: build | ||
run: |- | ||
# Only build marian (lib) for now. | ||
make -j2 | ||
|
||
- name: ccache epilog | ||
run: 'ccache -s # Print current cache stats' | ||
|
||
- uses: actions/upload-artifact@v2 | ||
with: | ||
path: ${{github.workspace}}/build/marian-decoder | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean, sorry? Is this 32bit vs 64bit? A small clarifying comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this catch the
unknown
arch condition, and is that desirable?