Start on work on moving calculation of derivatives to apply loop for actions that pass vectors and matrices #1093

gtribello · 2024-07-04T15:34:01Z

Description

This is the PR for the benchmarking that we discussed on Wednesday.

DO NOT MERGE

I will provide more explanation tomorrow.

Target release

I would like my code to appear in release 2.11

Type of contribution

changes to code or doc authored by PLUMED developers, or additions of code in the core or within the default modules
changes to a module not authored by you
new module contribution or edit of a module authored by you

Copyright

I agree to transfer the copyright of the code I have written to the PLUMED developers or to the author of the code I am modifying.

the module I added or modified contains a COPYRIGHT file with the correct license information. Code should be released under an open source license. I also used the command cd src && ./header.sh mymodulename in order to make sure the headers of the module are correct.

Tests

I added a new regtest or modified an existing regtest to validate my changes.
I verified that all regtests are passed successfully on GitHub Actions.

…vatives

…oring values as they are only needed on the backward pass to calculate forces

gtribello · 2024-07-05T07:52:10Z

Hi @Iximiel and @GiovanniBussi

I ran out of time to give the details of this yesterday so here they are.

You can calculate a coordination number using this version of the code using the following input:

c1: CONTACT_MATRIX GROUPA=1-3 GROUPB=1-512 SWITCH={EXP D_0=4 R_0=0.5 D_MAX=6}
ones: ONES SIZE=512
cc: MATRIX_VECTOR_PRODUCT ARG=c1,ones
mts: SUM ARG=cc PERIODIC=NO
PRINT ARG=mts FIle=colvar
BIASVALUE ARG=mts

I think it should be possible to reproduce the benchmarks that Daniele was doing on the various versions of the coordination number using the input above.

By default everything the calculation will be run in the same way that it is run in the current master version of the code. That is why all the regtests are passing for this branch. Basically the regtests are doing the same thing that they would be doing in the current master. In other words, when the regtests are run all the linked lists are still used.

To run without using the linked lists you use the following environment variable:

export PLUMED_FORBID_CHAINS=yes

It then does all the calculation of forces during the backwards (apply) loop.

It would be really nice if we could look at the performance of the code with the following options:

Calculation without BIASVALUE (i.e. no derivatives) and without PLUMED_FORBID_CHAINS option set.
Calculation without BIASVALUE (i.e. no derivatives) and with PLUMED_FORBID_CHAINS=yes option set.
Calculation with BIASVALUE (i.e. with derivatives) and without PLUMED_FORBID_CHAINS option set.
Calculation with BIASVALUE (i.e. with derivatives) and with PLUMED_FORBID_CHAINS=yes option set.

If we could have some slides with the performance of those four options by next Wednesday so we can discuss next steps in the meeting that would be really great. The benchmarks I am thinking off would look at the performance of the code as you increase the number of atoms. Essentially the sorts of grants @Iximiel has been producing to look at the various GPU flavours.

Thanks

Iximiel · 2024-07-08T08:19:00Z

@gtribello I am setting up the benches right now

A couple of questions:

In how the atom systems of the benches are set up, the first 3 atoms in each system are in a corner, and I think that this might test the branch if dist>dmax then skip of the various systems for most of the couples
- If I do not change the input file: How should I scale the systems? 3*n vs 512*n or 3 vs 512*n?

gtribello · 2024-07-08T09:07:40Z

Hi @Iximiel

I don't understand the queries. However, I made the input more complicated than it needed to be. You can do:

c1: CONTACT_MATRIX GROUP=1-100 SWITCH={EXP D_0=4 R_0=0.5 D_MAX=6}
ones: ONES SIZE=100
cc: MATRIX_VECTOR_PRODUCT ARG=c1,ones
mts: SUM ARG=cc PERIODIC=NO
PRINT ARG=mts FIle=colvar
BIASVALUE ARG=mts

To increase the number of atoms, you change the number of indices specified to GROUP in the CONTACT_MATRIX. You must also increase the SIZE in ONES to the number of atoms.

Does that help?

Iximiel · 2024-07-08T10:26:01Z

Ok, I'll go with GROUP=@mdatoms then, I was wondering if there was a reason for the groupA-groupB input.

On that note, there is a special kw for the number of atoms for ONES SIZE=?

gtribello · 2024-07-08T11:28:32Z

There is no special keyword for the number of atoms for ONES sorry.

Iximiel · 2024-07-10T09:13:08Z

As you asked, here's my benchmark set up, the example is for a local run, on LEONARDO I create a single run file per natoms-distr combination:

#!/bin/bash

nsteps=1000
list_of_natoms="500 2000 4000 6000 8000 10000 12000 14000 16000"
export PLUMED_MAXBACKUP=0

export PLUMED_NUM_THREADS=1
mpiprocesses=1
useDistr="line sc"

function setFname() {
  echo "${distr}_${mpiprocesses}mpi_${PLUMED_NUM_THREADS}threads_${natoms}_Steps${nsteps}"
}
mpicmd=""
if [[ $mpiprocesses -gt 1 ]]; then
  mpicmd="mpirun -n $mpiprocesses"
fi

if [[ $1 = run ]]; then
  module purge
  module load gnu9/9.4.0 openmpi4/4.1.1
  module load plumed/master09-07
  for distr in $useDistr; do

    for natoms in $list_of_natoms; do

      fname=$(setFname).out

      echo "output file: ${fname}"
      $mpicmd plumed benchmark \
        --plumed=plumed_master.dat:plumed_other.dat \
        --kernel=this:../../src/lib/install/libplumedKernel.so \
        --nsteps="${nsteps}" \
        --natoms="${natoms}" \
        --atom-distribution="${distr}" \
        >"${fname}"

      grep -B1 Comparative "${fname}"

    done
  done
fi
#postprocess the output
for distr in $useDistr; do
  echo "collecting ./$distr"
  {
    for natoms in $list_of_natoms; do
      echo -n "$natoms "
      fname=$(setFname).out
      grep Calculating "${fname}" |
        awk '{printf "%f ", $7}'
      echo ""
    done
  } >"timing_calculate_${distr}_s${nsteps}.data"
  {
    for natoms in $list_of_natoms; do
      echo -n "$natoms "
      fname=$(setFname).out
      sed -n '/PLUMED: *Cycles *Total *Average *Minimum *Maximum/{n ; p}' "${fname}" |
        awk '{printf "%f ", $3}'
      echo ""
    done
  } >"timing_total_${distr}_s${nsteps}.data"
done

--plumed is the list or the plumed file, --kernel is the kernel or the list of kernels
you can have a 1 inputfile - n kernels or a n inputfiles - 1 kernel or n inputfiles - n kernels and kernel1 will get input1, kernel2 input2 and so on

gtribello · 2024-07-10T09:49:06Z

Hi @Iximiel

Thanks for the script for the benchmarking. I quickly looked at the code and found something obvious that was causing a massive slowdown. Can you quickly rerun the benchmarks we discussed in today's meeting with this new version of the code and see what difference it makes?

The two options might now be comparable in cost. It would be great if you could post the benchmarks here and tag me in the post.

Thanks

Iximiel · 2024-07-10T12:15:51Z

Hi @gtribello!
Here's the results with the last commit d15c12f (green) and the old version b9b8969 (red)

the old one uses the time I measured yesterday

gtribello · 2024-07-10T13:29:58Z

Hi @Iximiel

Thanks for running the benchmarks again.

…employing the chain

src/matrixtools/MatrixTimesVector.cpp

+  if( actionInChain() ) { ActionWithMatrix::performTask( task_index, myvals ); return; }
+
+  if( sumrows ) {
+    unsigned n=getNumberOfArguments()-1; Value* myvec = getPntrToArgument(n);


gtribello · 2024-07-16T14:12:33Z

Hi @Iximiel and @GiovanniBussi

I had another look at the code in this branch and did some optimisations. These are the new timings against the master branch:

To be clear, the master here is doing the derivatives in the forward loop. When the master code runs, all the mechanics with the linked lists are used.

For the tests with this branch, I am using the PLUMED_FORBID_CHAINS=yes environment variable and turning off all the linked lists. Derivatives are thus calculated during the backwards (apply) loop.

It would be great to have one more benchmark on Leonardo, like the one in @Iximiel's previous message. Can you please set that up, @Iximiel?

If @Iximiel's benchmark also shows similar performance between the two versions, we can (perhaps) justify making the more significant change. In other words, we can use forces calculated in the backwards (apply) loop everywhere. This change will simplify the code. Removing the linked lists should also make it easier to add parallelism with GPUs to this part of the code.

What do you think? Would you agree that this is an OK way to proceed?

To clarify these changes will be made for v2.11 and there will be no changes of syntax.

Iximiel · 2024-07-17T14:36:41Z

@gtribello here are the benchmarks I did on LEONARDO:

The color code is the usual: in lighter color the "4 Calculating (forward loop)" part.
Initially, I did the benches with export PLUMED_NUM_THREADS=8, but they diverged a lot from the one you posted, so I added a run with PLUMED_NUM_THREADS=1
everything was run with

fname=sc_${PLUMED_NUM_THREADS}threads_2000nat_500steps
plumed benchmark --plumed="sc_cm2000master.dat:sc_cm2000target.dat" \
  --kernel="/path/to/master/lib/libplumedKernel.so:/path/to/plumed-derivatives-from-backpropagation/lib/libplumedKernel.so" \
  --maxtime=840 \
  --natoms=2000 --nsteps=500 --atom-distribution=sc >"${fname}.out"

The plumed file is, adapted to the number of atoms

c1: CONTACT_MATRIX GROUP=1-4000 SWITCH={EXP D_0=4 R_0=0.5 D_MAX=6}
ones: ONES SIZE=4000
cc: MATRIX_VECTOR_PRODUCT ARG=c1,ones
mts: SUM ARG=cc PERIODIC=NO
PRINT ARG=mts FIle=sc_colvar-bias-master1706-4000
BIASVALUE ARG=mts

I also did the same calculations, but with `--atom-distribution=line`, they are in the spoiler here:

…arrays.

gtribello · 2024-07-19T11:04:28Z

Hello @Iximiel and @GiovanniBussi

I've been working more on writing a version of PLUMED where the derivatives are calculated during the backward loop. I've also gotten confused. It would help me @Iximiel if you could run another benchmark just so that I can check that my conclusions about the state of the project are correct. Sorry and thank you in advance.

So, firstly, the confusion. I realised that the switching function parameters we used in that example input were not particularly sensible. With those switching function parameters, every single atom in the box with 8000 atoms was within the D_MAX cutoff of 6 nm that we set, which is not really how this type of CV is used.

I thus created a new input with more sensible switching function parameters. It is below:

c1: CONTACT_MATRIX GROUP=@mdatoms SWITCH={EXP D_0=1.0 R_0=0.1 D_MAX=2.0} 
ones: ONES SIZE=8000
cc: MATRIX_VECTOR_PRODUCT ARG=c1,ones
mts: SUM ARG=cc PERIODIC=NO 
PRINT ARG=mts FILE=colvar 
BIASVALUE ARG=mts

Separately @Iximiel it would be useful to add some detail to the manual for plumed benchmark about how the various possible configurations that can be used for benchmarking are setup. I picked the switching function parameters above by trial and error. It would be useful to understand what the configurations are. In other words, it would be good to know the typical spacing between atoms for the configurations used for Benchmarking.

When I run benchmarks, I get the following results:

I am thus concluding the following:

My version of coordination is WAY faster than COORDINATION - hurrah for link cells.
The new version of my coordination is faster than the version in master if you are not calculating derivatives
The new version of my coordination is slower than the version in master if you are calculating derivatives.

Thanks.

Iximiel · 2024-07-19T12:25:17Z

Hi @gtribello, with coordination you simply used

c: COORDINATION GROUPA=@mdatoms SWITCH={EXP D_0=4 R_0=0.5 D_MAX=6}

PRINT ARG=* FILE=@distr@_COORDColvar@natoms@ FMT=%8.4f STRIDE=1

FLUSH STRIDE=1

or something more elaborate?

I'll write something in the --help for plumed benchmark.
The rule of thumb is that the mean density should be ~ 1 atom pern nm^3 for the 100% random distribution (sphere, cube, etc.) and that the distance between first neighbors oscillates around 1 nm in the line and sc(simple cubic) distributions:

the line one is a line of atoms 1x1xnatoms in dimension, and I don't remember if it is compatible with pbcs
with "sc" the atoms are placed in a simple cubic reticle. The fill rule is: take a sc reticle of qxqxq atoms where q^3 is the nearest upper cubic number to natoms (q^3=1,8,27...) and the atoms will fill the reticle from the bottom up

I think it is better to use sc for a "realistic system" and line for stressing the NN optimizations

gtribello · 2024-07-19T13:36:07Z

Hello again @Iximiel

I did indeed write the input for coordination in the way you said.

I also used the sc for the atom distribution as I agree that it is a better test for the benchmarking.

Separately, if you could add the list of options for the --atom-distribution flag in its description that would be a good first thing to add to the documentation.

src/valtools/VStack.cpp

gtribello · 2024-07-19T15:37:58Z

Hello again

I found a bottleneck and got the new version to go a bit faster. You can see the plot here:

What I wrote in my previous message is still true, though:

With one thread and no derivatives, the new version is faster
With four threads and no derivatives, master is faster
With derivatives, master is faster with one and four threads

…hat we already worked out that some of the elements of the matrix are zero and that the forces on these elements are zero

…new scheme for forces Changes to reference files are because of accumulated small differences because the calculations are done in a different way.

I have removed the test on the forces for this regtest as the code with the new derivatives performs this calculation very slowly because I am no longer able to work out the subset of atoms on which to calculate the q6 parameter. Also I think this test is not very robust when the calculation is done in different ways. I am pretty certain that the forces for this type of calculation are calculated correctly as there are other tests that test forces that work in similar situations. However, some careful benchmarking of this type of calculation needs to be peformed before this branch is merged

See previous commit for explanation as to why test on forces has been removed.

src/matrixtools/MatrixTimesMatrix.cpp

… by default. Changes to regtest config files mostly remove code for setting this option on in some regtests. A few tests don't yet work with the new implementation and have been changed so as to run with the old implementation until I get them fixed.

…being calculated during the backwards loop For the beta sheet variables all the strands cutoff functionality is done in separate DISTANCE and CUSTOM actions. I prefer this way of doing things as it makes all the steps of the calculations visible in the expanded version of the shortcut's input file.

…me. Now everything in the code can run with the new scheme

src/secondarystructure/ParabetaRMSD.cpp

  std::string nopbcstr=""; bool nopbc; parseFlag("NOPBC",nopbc); if( nopbc ) nopbcstr = " NOPBC";
-  if( strands_cutoff.length()>0 ) strands_cutoff=" CUTOFF_ATOMS=6,21 STRANDS_CUTOFF="+strands_cutoff;
+  // if( strands_cutoff.length()>0 ) strands_cutoff=" CUTOFF_ATOMS=6,21 STRANDS_CUTOFF="+strands_cutoff;


…n forward loop

…BondProductMatrix

… over matrix elements rather than over the rows of the matrix These clases now inherit from ActionWithVector rather than ActionWithMatrix Changes to regtests revert changes that were made in an earlier commit when I started reworking the derivatives. In other words, the forces in these actions are back to the values that we had when we were using the chains derivatives. I had introduced an error in an earlier commit

src/core/Value.cpp

src/function/Custom.cpp

@@ -320,7 +323,7 @@
  action->log.printf("  with variables :");
  for(unsigned i=0; i<var.size(); i++) action->log.printf(" %s",var[i].c_str());
  action->log.printf("\n"); function.set( func, var, action );
-  std::vector<double> zeros( action->getNumberOfArguments(), 0 ); double fval = abs(function.evaluate(zeros));
+  std::vector<double> zeros( nargs, 0 ); double fval = abs(function.evaluate(zeros));


src/function/FunctionOfMatrix.h

src/matrixtools/MatrixTimesVector.cpp

…y longer

… no longer used for forces

src/core/ActionWithMatrix.cpp

+  for(unsigned i=0; i<getNumberOfComponents(); ++i) {
+    Value* myval = const_cast<Value*>( getConstPntrToComponent(i) ); unsigned ncols = myval->getNumberOfColumns();
+    if( myval->getRank()!=2 || myval->hasDerivatives() || ncols>=myval->getShape()[1] ) continue;
+    myval->matrix_bookeeping[task_index*(1+ncols)]=0;


To fix the problem, we need to ensure that the multiplication is performed using a larger integer type to prevent overflow. This can be done by casting one of the operands to the larger type before performing the multiplication. In this case, we will cast task_index to size_t before the multiplication.

General fix: Cast one of the operands to a larger type before performing the multiplication.

Detailed fix: In the file src/core/ActionWithMatrix.cpp, on line 58, cast task_index to size_t before performing the multiplication.

Specific changes: Modify line 58 to cast task_index to size_t.

Requirements: No additional methods, imports, or definitions are needed.

src/core/Value.cpp

-  matpos=p;
+  ncols = myarg->getNumberOfColumns(); matrix_bookeeping.resize( myarg->matrix_bookeeping.size() );
+  for(unsigned i=0; i<matrix_bookeeping.size(); ++i) matrix_bookeeping[i] = myarg->matrix_bookeeping[i];
+  data.resize( shape[0]*ncols ); inputForce.resize( shape[0]*ncols );


To fix the problem, we need to ensure that the multiplication is performed using a larger integer type before the result is used. This can be achieved by casting one of the operands to std::size_t before performing the multiplication. This way, the multiplication will be done using the larger type, preventing overflow.

The specific changes required are:

Cast shape[0] to std::size_t before multiplying it by ncols in the reshapeMatrixStore and copyBookeepingArrayFromArgument methods.

src/core/Value.cpp

-  matpos=p;
+  ncols = myarg->getNumberOfColumns(); matrix_bookeeping.resize( myarg->matrix_bookeeping.size() );
+  for(unsigned i=0; i<matrix_bookeeping.size(); ++i) matrix_bookeeping[i] = myarg->matrix_bookeeping[i];
+  data.resize( shape[0]*ncols ); inputForce.resize( shape[0]*ncols );


To fix the problem, we need to ensure that the multiplication is performed using a larger integer type before the result is assigned to the size variable. This can be achieved by casting one of the operands to std::size_t before performing the multiplication. This way, the multiplication will be done using the larger type, preventing any overflow.

General Fix: Cast one of the operands to std::size_t before performing the multiplication.

Detailed Fix: In the reshapeMatrixStore and copyBookeepingArrayFromArgument methods, cast shape[0] to std::size_t before multiplying it by ncols.

Specific Changes: Modify lines 276 and 293 in the file src/core/Value.cpp.

src/matrixtools/MatrixTimesMatrix.cpp

-    unsigned kind = myarg->getRowIndex( index1, i );
-    addDerivativeOnMatrixArgument( stored_matrix1, 0, 0, index1, kind, dvec1[i]/(2*matval), myvals );
-    addDerivativeOnMatrixArgument( stored_matrix2, 0, 1, kind, ind2, dvec2[i]/(2*matval), myvals );
+    unsigned kind = myarg->getRowIndex( index1, i ); if( colno[i]<0 ) continue;


src/matrixtools/MatrixTimesVector.cpp

… lot of malloc and optimise the code

gtribello · 2024-09-23T09:36:06Z

Hello @Iximiel and @GiovanniBussi

I am finally in a position to discuss how we move forward with this tomorrow morning. The latest timings for Coordination number, Q6 and LQ6 are shown below. The bars show the times for 20 steps. You can see that the new code performs much better than the current master.

In the graph above, the colours are as follows:

Blue = Current master 1 thread
Orange = New version 1 thread
Green = Current master 4 threads
Red = New version 4 threads

A small problem is that the code crashes when you try to run calculations of Q6 with derivatives for 8000 atoms with 4 threads. I think my laptop doesn't have enough memory to run calculations this large. This problem can probably be fixed. Before working too much on that, however, I would like to try to work out how to involve Daniele in this project and how we can start having some of this code run on GPUs.

See you tomorrow.

--

For reference, input for Q6 is:

q6: Q6 SPECIES=@mdatoms SWITCH={EXP D_0=1.0 R_0=0.1 D_MAX=2.0} SUM 
PRINT ARG=q6_sum FILE=colvar 
BIASVALUE ARG=q6_sum

and for LQ6

q6: Q6 SPECIES=@mdatoms SWITCH={EXP D_0=1.0 R_0=0.1 D_MAX=2.0} 
lq6: LOCAL_Q6 SPECIES=q6 SWITCH={EXP D_0=1.0 R_0=0.1 D_MAX=2.0} SUM 
PRINT ARG=lq6_sum FILE=colvar 
BIASVALUE ARG=lq6_sum

GiovanniBussi · 2024-09-23T10:11:50Z

Thanks @gtribello

I guess there's some wrong x-label, but if they all refer to something describing an increase of the number of atoms I think it's very promising. A few comments/questions:

If I understand correctly, the diffference is how you handle backpropagation. The new way on handling it is both better for the number of operations (so it gets faster now) and easier to port on GPU.
I am a bit confused by the timings. In the blue bar, are they expected to grow quadratically with N or linearly? If the correct x-label is the one in the top right figure I guess they are growing linearly, right? Do you change the scaling when you pass to the new implementation?
It's interesting that the memory problem is only with 4 threads. Perhaps the way you allocate the buffers for OMP reduction can be modified?
Again about the memory issue, would it be conceivable to rewrite this variable in blocks? Like, for coordination, it would be GROUPA=1-10 GROUPB=11-20 then GROUPA=11-20 GROUPB=1-10 then GROUPA=1-10 GROUPB=11-12 then GROUPA=11-20 GROUPB=11-20 and then take the sum? This could be even done automatically.

gtribello · 2024-09-23T14:16:33Z

Ah yes sorry. The number of atoms is always 500, 2000, 4000 and 8000. I forgot to change the tick labels on all the subfigs.

gtribello · 2024-09-23T14:17:16Z

I think the timings should grow linearly with the number of atoms because we are using a neighbour list.

…threads

gtribello · 2024-09-24T14:09:13Z

OK @GiovanniBussi and @Iximiel

I've added some further code and the memory problem is now fixed. The times including the 4 threads and 8000 atoms are as follows:

Iximiel · 2024-10-15T10:08:58Z

src/core/ActionWithVector.cpp

+  const ActionWithMatrix* am = dynamic_cast<const ActionWithMatrix*>(this);
+  myvals.setTaskIndex(current); myvals.vector_call=true; performTask( current, myvals );
+  for(unsigned i=0; i<getNumberOfComponents(); ++i) {
+    const Value* myval = getConstPntrToComponent(i);
+    if( am || myval->hasDerivatives() ) continue;
+    Value* myv = const_cast<Value*>( myval );
+    if( getName()=="RMSD_VECTOR" && myv->getRank()==2 ) continue;
+    myv->set( current, myvals.get( i ) );


This is a const function in which non-const things happen, it is called from non-const functions(runAllTasks and checkForForces), so it does not need the const modifier.
It may be easier to unflag this from its constness and change to code to something like, so there is not need to const_cast the values to modify them:

Suggested change

const ActionWithMatrix* am = dynamic_cast<const ActionWithMatrix*>(this);

myvals.setTaskIndex(current); myvals.vector_call=true; performTask( current, myvals );

for(unsigned i=0; i<getNumberOfComponents(); ++i) {

const Value* myval = getConstPntrToComponent(i);

if( am || myval->hasDerivatives() ) continue;

Value* myv = const_cast<Value*>( myval );

if( getName()=="RMSD_VECTOR" && myv->getRank()==2 ) continue;

myv->set( current, myvals.get( i ) );

myvals.setTaskIndex(current);

myvals.vector_call=true;

performTask( current, myvals );

if (const ActionWithMatrix* am = dynamic_cast<const ActionWithMatrix*>(this);!am){

const bool IamRMSD_VECTOR=getName()=="RMSD_VECTOR";

for(unsigned i=0; i<getNumberOfComponents(); ++i) {

if(Value* myval = getPntrToComponent(i);

!myval->hasDerivatives() &&

!( IamRMSD_VECTOR && myval->getRank()==2 )) {

myval->set( current, myvals.get( i ) );

}

}

I also moved some conditions outside the loop and changed the scope of am

Maybe the discrimination for RMSD_VECTOR can be done in it? Or with an ad-hoc overloaded method virtual bool skipIf(Value *)const{return true};?

Then for the parallelization, we can manipulate MultiValue into a Data sitter for cpu/gpu adresses

…atrices and grids with this command.

Gareth Aneurin Tribello added 2 commits July 4, 2024 16:07

Added flag that forces PLUMED to not use chains when calculating deri…

2982b55

…vatives

Now turning off derivatives on forward pass through task loop when st…

b9b8969

…oring values as they are only needed on the backward pass to calculate forces

gtribello added the wip Do not merge label Jul 4, 2024

gtribello requested a review from GiovanniBussi July 4, 2024 15:34

gtribello marked this pull request as draft July 10, 2024 07:52

Optimisations in adjacency matrix calculator

d15c12f

Gareth Aneurin Tribello added 3 commits July 16, 2024 11:07

Merge branch 'master' into derivatives-from-backpropegation

10da9a4

Merge branch 'master' into derivatives-from-backpropegation

ead56cd

Added faster version of matrix vector multiply that is used when not …

cd6f553

…employing the chain

github-advanced-security bot found potential problems Jul 16, 2024

View reviewed changes

Gareth Aneurin Tribello and others added 2 commits July 18, 2024 10:32

Added some optimisations in AdjacencyMatrixBase

94d33a9

Added optimisation for matrix times vector to avoid allocating large …

033d957

…arrays.

Optimisations of derivatives by back propegation

2115528

github-advanced-security bot found potential problems Jul 19, 2024

View reviewed changes

src/valtools/VStack.cpp Fixed Show fixed Hide fixed

Optimised adding of forces in AdjacencyMatrixBase by using the fact t…

7e0f435

…hat we already worked out that some of the elements of the matrix are zero and that the forces on these elements are zero

Gareth Aneurin Tribello added 5 commits September 6, 2024 19:56

Fixed code that say regtest adjmat/rt45 works as well

aaa48f6

Modified tests volumes/rt-q3-insphere and volumes/rt-small-sm to use …

e632be0

…new scheme for forces Changes to reference files are because of accumulated small differences because the calculations are done in a different way.

Fixed test volumes/rt-sean-mark to work with new derviative scheme

85bd330

See previous commit for explanation as to why test on forces has been removed.

Ran astyle

14c8511

github-advanced-security bot found potential problems Sep 8, 2024

View reviewed changes

src/matrixtools/MatrixTimesMatrix.cpp Fixed Show fixed Hide fixed

Gareth Aneurin Tribello added 4 commits September 10, 2024 14:31

Fixed QUATERNION_BOND_PRODUCT to work with new force propegation sche…

570827a

…me. Now everything in the code can run with the new scheme

Ran astyle

9d4c4e5

github-advanced-security bot found potential problems Sep 12, 2024

View reviewed changes

Gareth Aneurin Tribello and others added 4 commits September 12, 2024 11:17

First step in removing old implementation of derivatives calculated i…

6ca7a9a

…n forward loop

Removed old style derivatives from MatrixVectorProduct and Quaternion…

f8e4519

…BondProductMatrix

Ran astyle

5e562d7

github-advanced-security bot found potential problems Sep 15, 2024

View reviewed changes

Gareth Aneurin Tribello and others added 4 commits September 16, 2024 10:47

Removed all old chain stuff from FunctionOfVector

a88ff36

Removed more code that was used for Stream stuff that is not usedd an…

374d19f

…y longer

Removed some old chain functionality

03e8500

Removed code that was used for chained version of derivatives that is…

46f53ba

… no longer used for forces

github-advanced-security bot found potential problems Sep 20, 2024

View reviewed changes

Store copies large vectors in ActionWithVector so that we can avoid a…

8628615

… lot of malloc and optimise the code

Reduced the amount of memory that is used in Q6 so you can now run 4 …

636898f

…threads

Iximiel reviewed Oct 15, 2024

View reviewed changes

Fixed DUMPFORCES command so that you can output forces for vectors, m…

c9df402

…atrices and grids with this command.

@@ -275,3 +275,3 @@
               ncols=n; if( ncols>shape[1] ) ncols=shape[1];
-              unsigned size=shape[0]*ncols;
+              unsigned size=static_cast<std::size_t>(shape[0])*ncols;
               if( matrix_bookeeping.size()!=(size+shape[0]) ) {
@@ -292,3 +292,3 @@
               for(unsigned i=0; i<matrix_bookeeping.size(); ++i) matrix_bookeeping[i] = myarg->matrix_bookeeping[i];
-              data.resize( shape[0]*ncols ); inputForce.resize( shape[0]*ncols );
+              data.resize( static_cast<std::size_t>(shape[0])*ncols ); inputForce.resize( static_cast<std::size_t>(shape[0])*ncols );
             }

@@ -275,3 +275,3 @@
               ncols=n; if( ncols>shape[1] ) ncols=shape[1];
-              unsigned size=shape[0]*ncols;
+              unsigned size=static_cast<std::size_t>(shape[0])*ncols;
               if( matrix_bookeeping.size()!=(size+shape[0]) ) {
@@ -292,3 +292,3 @@
               for(unsigned i=0; i<matrix_bookeeping.size(); ++i) matrix_bookeeping[i] = myarg->matrix_bookeeping[i];
-              data.resize( shape[0]*ncols ); inputForce.resize( shape[0]*ncols );
+              data.resize( static_cast<std::size_t>(shape[0])*ncols ); inputForce.resize( static_cast<std::size_t>(shape[0])*ncols );
             }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start on work on moving calculation of derivatives to apply loop for actions that pass vectors and matrices #1093

Start on work on moving calculation of derivatives to apply loop for actions that pass vectors and matrices #1093

gtribello commented Jul 4, 2024

gtribello commented Jul 5, 2024

Iximiel commented Jul 8, 2024

gtribello commented Jul 8, 2024

Iximiel commented Jul 8, 2024

gtribello commented Jul 8, 2024

Iximiel commented Jul 10, 2024

gtribello commented Jul 10, 2024

Iximiel commented Jul 10, 2024

gtribello commented Jul 10, 2024

gtribello commented Jul 16, 2024 •

edited

Loading

Iximiel commented Jul 17, 2024

gtribello commented Jul 19, 2024

Iximiel commented Jul 19, 2024

gtribello commented Jul 19, 2024

gtribello commented Jul 19, 2024

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

gtribello commented Sep 23, 2024 •

edited

Loading

GiovanniBussi commented Sep 23, 2024

gtribello commented Sep 23, 2024

gtribello commented Sep 23, 2024

gtribello commented Sep 24, 2024

Iximiel Oct 15, 2024

Start on work on moving calculation of derivatives to apply loop for actions that pass vectors and matrices #1093

Are you sure you want to change the base?

Start on work on moving calculation of derivatives to apply loop for actions that pass vectors and matrices #1093

Conversation

gtribello commented Jul 4, 2024

Description

Target release

Type of contribution

Copyright

Tests

gtribello commented Jul 5, 2024

Iximiel commented Jul 8, 2024

gtribello commented Jul 8, 2024

Iximiel commented Jul 8, 2024

gtribello commented Jul 8, 2024

Iximiel commented Jul 10, 2024

gtribello commented Jul 10, 2024

Iximiel commented Jul 10, 2024

gtribello commented Jul 10, 2024

gtribello commented Jul 16, 2024 • edited Loading

Iximiel commented Jul 17, 2024

gtribello commented Jul 19, 2024

Iximiel commented Jul 19, 2024

gtribello commented Jul 19, 2024

gtribello commented Jul 19, 2024

gtribello commented Sep 23, 2024 • edited Loading

GiovanniBussi commented Sep 23, 2024

gtribello commented Sep 23, 2024

gtribello commented Sep 23, 2024

gtribello commented Sep 24, 2024

Iximiel Oct 15, 2024

Choose a reason for hiding this comment

gtribello commented Jul 16, 2024 •

edited

Loading

gtribello commented Sep 23, 2024 •

edited

Loading