Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM Backend using ruy for fp32 and int8 #79

Merged
merged 70 commits into from
Jan 18, 2023
Merged
Show file tree
Hide file tree
Changes from 69 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
79d7b33
ARM Backend for marian
jerinphilip Feb 8, 2022
2ac7cbc
Fix sentencepiece submodule mixup
Mar 9, 2022
93b841b
Merge branch 'browsermt-master' into arm-backend
Mar 11, 2022
9674973
[sentencepiece] android cmake additional libs
Apr 10, 2022
f3e7818
Remove separately added patch in favour of submodule update
Apr 10, 2022
5250b9e
Remove trailing newline in integer_common.h to prettify diff
Apr 10, 2022
26d3ba2
Remove trailing newline in ruy_adapter.h
Apr 10, 2022
b7969b0
Merge branch 'browsermt-master' into arm-backend
Apr 11, 2022
b271b70
In-place multiply without malloc by reinterpret_cast
jerinphilip Apr 14, 2022
efa5a85
Documentation for the stdcpp/NEON paths created
jerinphilip Apr 14, 2022
179f239
Remove templated abort transpose()
jerinphilip Apr 18, 2022
0d189c8
Reinterpret at unquantize add bias as well as int32_t from float32_t
jerinphilip Apr 18, 2022
8951261
Remove AlignedVector from ruy_adapter - not required here.
jerinphilip Apr 18, 2022
49beb50
Remove ViaRuy::PrepareBias without effect to output
jerinphilip Apr 18, 2022
3cf85f7
If SSE4.1 found use it to avoid perf regressions even if not -march=n…
jerinphilip Apr 18, 2022
4edc8ef
Deduplicate multiply by capturing variability through callbacks
jerinphilip Apr 18, 2022
a414b60
Revert "If SSE4.1 found use it to avoid perf regressions even if not …
jerinphilip Apr 18, 2022
e2069bf
CMAKE_SYSTEM_PROCESSOR indicates x86 and native mode is not enabled, …
jerinphilip Apr 18, 2022
1b4049a
Remove comments, now that callback is working
jerinphilip Apr 18, 2022
e522e6c
Minimal gemmRuy
jerinphilip Apr 19, 2022
90858a5
Update CI
jerinphilip Apr 19, 2022
557de0c
Using simd_utils instead of SIMDE
jerinphilip Apr 25, 2022
d10009f
Style fixes: UnquantizeAndWrite, UnquantizeAddBiasAndWrite
jerinphilip May 28, 2022
3a37966
const for () operator overrides
jerinphilip May 28, 2022
418a7ce
Explicit for single argument constructor: UnquantizeAndWrite
jerinphilip May 28, 2022
b7412c3
Fix typo
jerinphilip May 28, 2022
071e0d4
Low compute path for special case alpha = 1.0
jerinphilip May 28, 2022
ec886bd
Remove clang only pragmas
jerinphilip May 30, 2022
4df1998
Remove leftover bias cycles comment
jerinphilip May 30, 2022
1defce6
Merge branch 'master' into arm-backend
jerinphilip May 31, 2022
c4be980
Defaults: intgemm for x86_64 and ruy and simd_utils for arm
jerinphilip Jun 2, 2022
b181847
Revert "Defaults: intgemm for x86_64 and ruy and simd_utils for arm"
jerinphilip Jun 6, 2022
6e4c561
Target architecture detection for ARM
jerinphilip Jun 7, 2022
53636cf
Remove DEBUG statements
jerinphilip Jun 7, 2022
be9e153
Remove IntBase inheritance; PrepareB still unimplemented
jerinphilip Jun 8, 2022
876a915
Remove obsolete comment
jerinphilip Jun 8, 2022
d399a35
Remove logging statement in hotpath
jerinphilip Jun 8, 2022
06b6dd9
Use CMAKE_CXX_FLAGS instead of add_definitions 🤦
jerinphilip Jun 9, 2022
e310f73
Check: Does add_compile_{defs,opts} propogate up?
jerinphilip Jun 9, 2022
3bf1133
Fix typo: definitions
jerinphilip Jun 9, 2022
5c8b1d2
Undo edit attempts manually for min-diff; Using compile_definitions now
jerinphilip Jun 9, 2022
9dd1eff
Restore CMakeDependentOption; Rename only to ONNX_SGEMM
jerinphilip Jun 9, 2022
b055c11
Backtrack attempt to flatten ONNX_SGEMM out
jerinphilip Jun 9, 2022
d006196
USE_ONNX_SGEMM is a CMakeDependentOption
jerinphilip Jun 9, 2022
39b7237
Keep pre armv8 TargetArch detect unchanged
jerinphilip Jun 10, 2022
3a6c515
Simple ARM detection to no-op out shifted/shiftedAll paths
jerinphilip Jun 14, 2022
46db01b
Add logging statements to indicate forced gemm-path change at constru…
jerinphilip Jun 14, 2022
63fea9a
Remove run script
jerinphilip Jun 14, 2022
82a15e1
Removing kStandardCpp - may add later for tests separately
jerinphilip Jun 14, 2022
4a8c0da
Remove leftover gcc diagnostic pop for SIMDE
jerinphilip Jun 14, 2022
e17a5dd
Remove simde-no-tests reference in CMakeLists.txt file
jerinphilip Jun 14, 2022
3c8a149
Remove obsolete comments
jerinphilip Jun 14, 2022
800402c
Explain copying x86-SSE structure for NEON
jerinphilip Jun 14, 2022
9d648d0
Remove executable upload for android
jerinphilip Jun 20, 2022
d19a312
Remove comment
jerinphilip Jun 20, 2022
1b38e01
Restore -Werror
jerinphilip Jun 21, 2022
9027ea4
Switch to a {{0}} sigaction on WASM, {0} for rest
jerinphilip Jun 21, 2022
a0ee527
Revert "Restore -Werror"
jerinphilip Jun 21, 2022
6285f28
Use -DFMA for NEON from simd_utils example
jerinphilip Jun 21, 2022
8895fda
Remove redundant neon_mathfun include after simd_utils.h
jerinphilip Jun 21, 2022
c6c3ac6
Wrap CmakeLists.txt ARM definitions with an if
jerinphilip Jun 21, 2022
3baf620
Use __clang__ instead of WASM_COMPATIBLE_SOURCE; emcc uses LLVM
jerinphilip Jun 23, 2022
aa1842c
Suppress warnings by #pragma GCC diagnostic ...
jerinphilip Jun 23, 2022
8eae08b
Re-enable -Werror
jerinphilip Jun 23, 2022
9a541c4
{0} -> {} to work around empty-braces Werror
jerinphilip Jun 23, 2022
4b80399
Replace -Wall with -Wcomment
jerinphilip Jun 24, 2022
ac8de91
Revert "Replace -Wall with -Wcomment"
jerinphilip Jun 24, 2022
38b608a
Disable formatting then local edit -Wall -> -Wcomment
jerinphilip Jun 24, 2022
86c8d44
Do not check for BLAS on usual ARM, except Mac: Apple Accelerate
jerinphilip Jun 27, 2022
861e31d
Fix endif: CMakeScript quirks
jerinphilip Jul 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions .github/workflows/arm.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
name: ARM
'on':
push:
branches:
- main
- ci-sandbox
pull_request:
branches:
- '**'
env:
ccache_basedir: ${{ github.workspace }}
ccache_dir: "${{ github.workspace }}/.ccache"
ccache_compilercheck: content
ccache_compress: 'true'
ccache_compresslevel: 9
ccache_maxsize: 200M
ccache_cmake: -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache
ndk: "${{ github.workspace }}/android-ndk-r23b"
abi: "arm64-v8a"
minsdk_version : 28
android_platform: 28

jobs:
ubuntu:
name: "arm-v8a cross-compile via Android NDK"
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v2
with:
submodules: recursive

- name: Install prerequisites
run: |
wget -c --quiet https://dl.google.com/android/repository/android-ndk-r23b-linux.zip
unzip -qq android-ndk-r23b-linux.zip
sudo apt-get -y install ccache cmake

- name: Generate ccache_vars for ccache based on machine
shell: bash
id: ccache_vars
run: |-
echo "::set-output name=hash::$(echo ${{ env.ccache_compilercheck }})"
echo "::set-output name=timestamp::$(date '+%Y-%m-%dT%H.%M.%S')"

- name: Cache-op for build-cache through ccache
uses: actions/cache@v2
with:
path: ${{ env.ccache_dir }}
key: ccache-${{ matrix.identifier }}-${{ steps.ccache_vars.outputs.hash }}-${{ github.ref }}-${{ steps.ccache_vars.outputs.timestamp }}
restore-keys: |-
ccache-${{ matrix.identifier }}-${{ steps.ccache_vars.outputs.hash }}-${{ github.ref }}
ccache-${{ matrix.identifier }}-${{ steps.ccache_vars.outputs.hash }}
ccache-${{ matrix.identifier }}

- name: ccache environment setup
run: |-
echo "CCACHE_COMPILER_CHECK=${{ env.ccache_compilercheck }}" >> $GITHUB_ENV
echo "CCACHE_BASEDIR=${{ env.ccache_basedir }}" >> $GITHUB_ENV
echo "CCACHE_COMPRESS=${{ env.ccache_compress }}" >> $GITHUB_ENV
echo "CCACHE_COMPRESSLEVEL=${{ env.ccache_compresslevel }}" >> $GITHUB_ENV
echo "CCACHE_DIR=${{ env.ccache_dir }}" >> $GITHUB_ENV
echo "CCACHE_MAXSIZE=${{ env.ccache_maxsize }}" >> $GITHUB_ENV

- name: ccache prolog
run: |-
ccache -s # Print current cache stats
ccache -z # Zero cache entry

- name: Generate buildfiles for marian on android via cmake
run: |-
mkdir -p build
cd build
NDK=${{ env.ndk }}
ABI=${{ env.abi }}
MINSDK_VERSION=${{ env.minsdk_version }}
ANDROID_PLATFORM=${{ env.android_platform }}
OTHER_ANDROID_ARGS=(
-DANDROID_ARM_NEON=TRUE
)
OTHER_MARIAN_ARGS=(
-DCOMPILE_CUDA=off
-DCOMPILE_CPU=on
-DCMAKE_HAVE_THREADS_LIBRARY=1
-DCMAKE_USE_WIN32_THREADS_INIT=0
-DCMAKE_USE_PTHREADS_INIT=1
-DTHREADS_PREFER_PTHREAD_FLAG=ON
-DBUILD_ARCH=armv8-a
# -DCOMPILE_WITHOUT_EXCEPTIONS=on # Apparently this can reduce the binary size, let's see.
)
# Additionally list variables finally configured.
cmake -L \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake \
-DANDROID_TOOLCHAIN=clang \
-DANDROID_ABI=$ABI \
-DANDROID_PLATFORM=$ANDROID_PLATFORM \
-DANDROID_NATIVE_API_LEVEL=$MINSDKVERSION \
-DANDROID_TOOLCHAIN_NAME=arm-linux-androideabi-4.8 \
-DANDROID_STL=c++_static \
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache \
"${OTHER_ANDROID_ARGS[@]}" "${OTHER_MARIAN_ARGS[@]}" \
..


- name : Build marian for android
working-directory: build
run: |-
# Only build marian (lib) for now.
make -j2

- name: ccache epilog
run: 'ccache -s # Print current cache stats'

- uses: actions/upload-artifact@v2
with:
path: ${{github.workspace}}/build/marian-decoder


6 changes: 6 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,9 @@
[submodule "src/3rd_party/onnxjs"]
path = src/3rd_party/onnxjs
url = https://github.com/abhi-agg/onnxjs.git
[submodule "src/3rd_party/ruy"]
path = src/3rd_party/ruy
url = https://github.com/google/ruy
[submodule "src/3rd_party/simd_utils"]
path = src/3rd_party/simd_utils
url = https://github.com/JishinMaster/simd_utils/
89 changes: 74 additions & 15 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,19 @@ set(CMAKE_CXX_STANDARD_REQUIRED ON)

include(CMakeDependentOption)

# Architecture detection
include(TargetArch)

target_architecture(CMAKE_TARGET_ARCHITECTURES)
list(LENGTH CMAKE_TARGET_ARCHITECTURES cmake_target_arch_len)
if(NOT "${cmake_target_arch_len}" STREQUAL "1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean, sorry? Is this 32bit vs 64bit? A small clarifying comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this catch the unknown arch condition, and is that desirable?

set(CMAKE_TARGET_ARCHITECTURE_UNIVERSAL TRUE)
set(CMAKE_TARGET_ARCHITECTURE_CODE "universal")
else()
set(CMAKE_TARGET_ARCHITECTURE_UNIVERSAL FALSE)
set(CMAKE_TARGET_ARCHITECTURE_CODE "${CMAKE_TARGET_ARCHITECTURES}")
endif()

# Custom CMake options
option(COMPILE_CPU "Compile CPU version" ON)
option(COMPILE_CUDA "Compile GPU version" ON)
Expand All @@ -31,24 +44,58 @@ option(USE_CCACHE "Use ccache compiler cache (https://ccache.dev)" OFF)
option(USE_CUDNN "Use CUDNN library" OFF)
option(USE_DOXYGEN "Build documentation with Doxygen" ON)
option(USE_FBGEMM "Use FBGEMM" OFF)
option(USE_INTGEMM "Use INTGEMM" OFF)
option(USE_RUY "Use Ruy" OFF)
option(USE_MKL "Compile with MKL support" ON)
option(USE_MPI "Use MPI library" OFF)
option(USE_NCCL "Use NCCL library" ON)
option(USE_SENTENCEPIECE "Download and compile SentencePiece" ON)
option(USE_STATIC_LIBS "Link statically against non-system libs" OFF)
option(GENERATE_MARIAN_INSTALL_TARGETS "Generate Marian install targets (requires CMake 3.12+)" OFF)
option(M32_BINARIES "Generate 32bit binaries even when building outside of WASM. Useful for testing some WASM specific functionality without the need for the compiling to WASM." OFF)
jerinphilip marked this conversation as resolved.
Show resolved Hide resolved
option(COMPILE_WASM "Compile (wasm compatible) marian for WASM target" OFF)
option(USE_WASM_COMPATIBLE_SOURCE "Enable the minimal marian sources that compile to wasm. Useful for debugging wasm failures by building same sources natively" OFF)

option(USE_SIMD_UTILS "Enable simde to target instruction sets" OFF)
option(USE_RUY_SGEMM "Compile with Ruy SGEMM" OFF)
option(COMPILE_WITHOUT_EXCEPTIONS "Compile without exceptions" OFF)

# cmake options that are dependent on USE_WASM_COMPATIBLE_SOURCE cmake option
CMAKE_DEPENDENT_OPTION(USE_THREADS "Compile with multi-threading support" OFF
"USE_WASM_COMPATIBLE_SOURCE" ON)
CMAKE_DEPENDENT_OPTION(USE_WASM_COMPATIBLE_BLAS "Compile with wasm compatible blas" ON
jerinphilip marked this conversation as resolved.
Show resolved Hide resolved
CMAKE_DEPENDENT_OPTION(USE_ONNX_SGEMM "Compile with wasm compatible blas" ON
"USE_WASM_COMPATIBLE_SOURCE" OFF)
CMAKE_DEPENDENT_OPTION(COMPILE_WITHOUT_EXCEPTIONS "Compile without exceptions" ON
jerinphilip marked this conversation as resolved.
Show resolved Hide resolved
"USE_WASM_COMPATIBLE_SOURCE" OFF)

if(${CMAKE_TARGET_ARCHITECTURE_CODE} MATCHES "arm")
set(USE_RUY ON)

# Apple M1 has Apple Accelerate(?).
if(NOT APPLE)
set(USE_RUY_SGEMM ON)
endif(NOT APPLE)

set(USE_SIMD_UTILS ON)
else()
set(USE_INTGEMM ON)
endif()

if(USE_INTGEMM)
add_compile_definitions(USE_INTGEMM=1)
endif(USE_INTGEMM)

if(USE_SIMD_UTILS)
if(${CMAKE_TARGET_ARCHITECTURE_CODE} MATCHES "arm")
add_compile_definitions(ARM FMA SSE) #added for ARM
endif()
if(MSVC)
add_compile_options(/flax-vector-conversions)
else(MSVC)
add_compile_options(-flax-vector-conversions)
jerinphilip marked this conversation as resolved.
Show resolved Hide resolved
endif(MSVC)
endif(USE_SIMD_UTILS)


if (USE_WASM_COMPATIBLE_SOURCE)
set(SPM_BUILD_LIBRARY_ONLY ON CACHE BOOL "Build only sentencepiece library (skip building executables)")
add_compile_definitions(WASM_COMPATIBLE_SOURCE)
Expand All @@ -61,10 +108,11 @@ if (COMPILE_WASM)
set(WORMHOLE ON CACHE BOOL "Use WASM wormhole in intgemm https://bugzilla.mozilla.org/show_bug.cgi?id=1672160")
endif()

if(M32_BINARIES OR COMPILE_WASM)

if(COMPILE_WASM)
set("BUILD_WIDTH" "-m32")
else(M32_BINARIES OR COMPILE_WASM)
set("BUILD_WIDTH" "-m64")
else(COMPILE_WASM)
set("BUILD_WIDTH" "")
endif()

if(NOT COMPILE_WASM)
Expand Down Expand Up @@ -194,7 +242,6 @@ if(MSVC)
add_definitions(-DUSE_FBGEMM=1 -DFBGEMM_STATIC=1)
endif(USE_FBGEMM)
else(MSVC)

# Check we are using at least g++ 5.0
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 5.0)
message(FATAL_ERROR "FATAL ERROR: Compiling Marian requires at least g++ 5.0, your version is ${CMAKE_CXX_COMPILER_VERSION}")
Expand Down Expand Up @@ -249,12 +296,14 @@ else(MSVC)
# -msse4.1 once marian can solely be compiled with intgemm ("onnxjs" will be removed in that case)
set(INTRINSICS "-mssse3 -msimd128")
else()
set(INTRINSICS "-msse4.1")
if(CMAKE_SYSTEM_PROCESSOR STREQUAL x86_64 OR CMAKE_SYSTEM_PROCESSOR STREQUAL amd64)
set(INTRINSICS "-msse4.1")
endif ()
endif()

if(USE_FBGEMM)
set(EXT_LIBS ${EXT_LIBS} fbgemm dl)
add_definitions(-DUSE_FBGEMM=1)
add_compile_definitions(USE_FBGEMM=1)
endif(USE_FBGEMM)

if (CMAKE_CXX_COMPILER_ID MATCHES "Clang" AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 9.0)
Expand Down Expand Up @@ -324,6 +373,7 @@ else(MSVC)
endif(COMPILE_WASM)
endif(MSVC)


# with gcc 7.0 and above we need to mark fallthrough in switch case statements
# that can be done in comments for backcompat, but CCACHE removes comments.
# -C makes gcc keep comments.
Expand Down Expand Up @@ -544,24 +594,32 @@ endif(USE_MPI)
###############################################################################
# Find BLAS library for CPU compilation
if(COMPILE_CPU)
set(EXT_LIBS ${EXT_LIBS} intgemm) # Move the intgemm bits on top since they compile with every single variant
if(USE_INTGEMM)
set(EXT_LIBS ${EXT_LIBS} intgemm) # Move the intgemm bits on top since they compile with every single variant
endif(USE_INTGEMM)

if(USE_RUY OR USE_RUY_SGEMM)
set(EXT_LIBS ${EXT_LIBS} ruy)
endif(USE_RUY)
jerinphilip marked this conversation as resolved.
Show resolved Hide resolved

add_definitions(-DCOMPILE_CPU=1) # Move the compile CPU definition on top since we want to compile intgemm when we set compile CPU
# in case a BLAS vendor is not found, we have a runtime error, although we should probably not allow the compilation to go on
# if there are BLAS vendors, we have other runtime checks with sane error messages.
if(USE_WASM_COMPATIBLE_BLAS)
if(USE_ONNX_SGEMM)
## Use a wasm compatible BLAS
## ^ SGEMM != BLAS
set(EXT_LIBS ${EXT_LIBS} onnx-sgemm)
set(BLAS_FOUND TRUE)
set(BLAS_VENDOR "ONNX-SGEMM")
add_definitions(-DBLAS_FOUND=1 -DWASM_COMPATIBLE_BLAS=1) # Might be required in some cmake files further down the line, let's avoid using add_compile_definitions in this codeblock
add_definitions(-DUSE_ONNX_SGEMM=1) # Might be required in some cmake files further down the line, let's avoid using add_compile_definitions in this codeblock
jerinphilip marked this conversation as resolved.
Show resolved Hide resolved
elseif(APPLE AND USE_APPLE_ACCELERATE)
set(BLAS_VENDOR "Accelerate")
# see https://developer.apple.com/documentation/accelerate for more info
# you may need to install Xcode command line tools if you don't have them already (https://developer.apple.com/xcode/features/)
include_directories("/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Headers")
set(EXT_LIBS ${EXT_LIBS} "-framework Accelerate")
add_definitions(-DBLAS_FOUND=1)
else(USE_WASM_COMPATIBLE_BLAS)
elseif(USE_RUY_SGEMM)
add_compile_definitions(USE_RUY_SGEMM=1)
else(USE_ONNX_SGEMM)
jerinphilip marked this conversation as resolved.
Show resolved Hide resolved
if(USE_MKL)
find_package(MKL)
endif(USE_MKL)
Expand All @@ -582,7 +640,8 @@ if(COMPILE_CPU)
endif(CBLAS_FOUND)
endif(BLAS_FOUND)
endif(MKL_FOUND)
endif(USE_WASM_COMPATIBLE_BLAS)
endif(USE_ONNX_SGEMM)
jerinphilip marked this conversation as resolved.
Show resolved Hide resolved
jerinphilip marked this conversation as resolved.
Show resolved Hide resolved

endif(COMPILE_CPU)

###############################################################################
Expand Down
Loading