forked from flame/blis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGELOG
2618 lines (2068 loc) · 108 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
commit 089048d5895a30221b6b1976c9be93ad6443420d (HEAD, tag: 0.1.0, origin/master, master)
Author: Field G. Van Zee <[email protected]>
Date: Sat Nov 9 17:18:00 2013 -0600
Added object wrappers to 1f test suite modules.
Details:
- Added missing object wrappers to level-1f test suite modules. This was
only apparent if you were configuring with something other than the
reference configuration.
- Commented out object-wrappers in level-1f front-ends. These were not
working as intended the reference configuration was selected, because
most kernel sets, such as those in the template set, do not have object
wrappers.
- Whitespace changes to template micro-kernels.
- Comment changes to template level-1f kernel headers.
commit 9ef3752079de10124bed906b5d28479d04aa8187
Author: Field G. Van Zee <[email protected]>
Date: Fri Nov 8 17:20:47 2013 -0600
Updated template kernels wrt KernelsHowTo wiki.
Details:
- Merged latest state of KernelsHowTo wiki into template micro-kernels
located in config/template/kernels/3.
commit 376bbb59c8944e29c5c1ff6637920d8451370afa
Author: Field G. Van Zee <[email protected]>
Date: Fri Nov 8 11:17:34 2013 -0600
Removed support for duplication.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
and all framework code.
- Updated test suite modules according to above changes.
commit 68a5910974b62b4df853fae2a68cb04df9d5a19c
Author: Field G. Van Zee <[email protected]>
Date: Thu Nov 7 11:36:11 2013 -0600
Added comments to testsuite/input.operations.
Details:
- Added extensive comments to the top of testsuite/input.operations,
which describe how to edit the file.
- Removed input.operations.0 and input.operations.1.
- Changed input.general to test all datatypes ("sdcz") by default.
commit a98f78b715fb256a519870071bb5266130d70b21
Author: Field G. Van Zee <[email protected]>
Date: Wed Nov 6 15:32:47 2013 -0600
Changed dim_t and inc_t to be signed integers.
Details:
- Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
This will facilitate interoperability with Fortran in the future.
(Fortran does not support unsigned integers.)
- Redefined many instances of stride-related macros so that they return
or use the absolute value of the strides, rather than the raw strides
which may now be signed. Added new macros bli_is_row_stored_f() and
bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
and changed the packm_blk_var[23] variants to use these macros instead
of the existing bli_is_row_stored(), bli_is_col_stored().
- Added/adjusted typecasting to to various functions/macros, including
bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
related macros in bli_param_macro_defs.h.
- Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
layer properly handles situations where vector increments are negative.
Thanks to Vladimir Sukharev for pointing out this issue.
- Changed type of increment parameters in bli_adjust_strides() from dim_t
to inc_t. Likewise in bli_check_matrix_strides().
- Defined bli_check_matrix_object(), which checks for negative strides.
- Redefined bli_check_scalar_object() and bli_check_vector_object() so
that they also check for negative stride.
- Added instances of bli_check_matrix_object() to various operations'
_check routines.
commit 1f8afc3e08a4312cfe810be86aedeacbc57275c5
Author: Field G. Van Zee <[email protected]>
Date: Wed Nov 6 10:09:10 2013 -0600
Minor comment update to BLAS compat files.
commit 1abbf768afafc158d44e4d5c4a135cfd9e277f13
Author: Field G. Van Zee <[email protected]>
Date: Mon Nov 4 15:50:00 2013 -0600
Fixed bugs in scalv and setv.
Details:
- Fixed bugs similar to those addressed in cca1e1f51dc6, whereby
a segmentation fault may occur if beta is not the same type as
the vector operand for scalv and setv.
- Changed axpyv and scal2v front-ends in a similar fashion.
commit f5953259a1842ee48e5833c22ac86e68a337bfe1
Author: Field G. Van Zee <[email protected]>
Date: Mon Nov 4 14:43:55 2013 -0600
Fixed a bug related to Hermitian matrix diagonals.
Details:
- Fixed a bug whereby BLIS assumed that the imaginary components of the
diagonal elements of Hermitian matrices were already zero. This property
is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
to Vladimir Sukharev for reporting this bug.
- Minor comment updates to template kernels.
commit d70f2b089dac8b9e4c19295dfa6014c36afee2ec
Author: Field G. Van Zee <[email protected]>
Date: Sat Nov 2 17:19:40 2013 -0500
Added scaling to abval2s, sqrt2s macros.
Details:
- Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
and overflow from squaring the real and imaginary components. (This is
the same technique used to fix recent bugs in invscals/invscaljs and
inverts.)
commit c5b1ed9409ae2f71d04041eef5da9a0080b5784a
Author: Field G. Van Zee <[email protected]>
Date: Fri Nov 1 10:28:04 2013 -0500
Added new dotxaxpyf variant 2.
Details:
- Added a new variant for dotxaxpyf that is based on dotxf and axpyf
kernels. By default, this variant is not used by any other operation.
commit 97f89fbcf202d72fc440b614708e352ea31633e2
Author: Field G. Van Zee <[email protected]>
Date: Fri Nov 1 10:16:39 2013 -0500
Fixed bug in complex invscals.
Details:
- Fixed complex inversion in invscals and invscaljs whereby the
imaginary component was being computed incorrectly.
- Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
in inverts, invscals, and invscaljs.
- Changed bli_abs() and bli_fabs() macro definitions to use "<="
operator instead of "<".
commit eda42a21d17a2742eab69ab801ed530b82488c8a
Author: Field G. Van Zee <[email protected]>
Date: Thu Oct 31 18:00:44 2013 -0500
Defined missing symbols in bla_rotg.c
Details:
- Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
these bugs.
commit cca1e1f51dc67a2c3725d5c1837256831aaf70f8
Author: Field G. Van Zee <[email protected]>
Date: Wed Oct 30 14:39:01 2013 -0500
Fixed bugs in scalm and setm.
Details:
- Fixed bugs in scalm and setm that resulted in segmentation faults when
beta is not the same type as the matrix operand. Thanks to Vladimir
Sukharev for reporting this bug.
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
and setm; namely, the alpha scalar is copy-cast the type of the first
matrix operand.
- Changed the template and reference configurations' bli_config.h files
so that the number of memory allocator blocks of A and B are set based
on BLIS_MAX_NUM_THREADS.
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.
commit 2807013a4761c2b84b3944de64d23483ad7ef2fb
Author: Field G. Van Zee <[email protected]>
Date: Thu Oct 24 14:32:20 2013 -0500
Fixed over/under-flow in complex inversion.
Details:
- Fixed the complex bli_?inverts() macros, which were inverting elements
in an "unsafe" manner, such that very large and very small values were
unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
reporting this bug.
- Comment update to bli_sumsqv_unb_var1.c.
- Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
- Changed 1.0F to 1.0 for bli_drands() macro.
commit 45a80c625f84edb2ade6ac25efe2b9c589d7e0df
Author: Field G. Van Zee <[email protected]>
Date: Wed Oct 23 12:15:25 2013 -0500
Fixed parameter checking issue in BLAS syr[2]k.
Details:
- Fixed a minor parameter checking bug in the BLAS compatibility layer
for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the
trans parameter of either operation, it is (a) allowed, and (b) treated
as 'T' (whereas previously it was disallowed). Thanks for Vladimir
Sukharev for finding and reporting this bug.
commit a091a219bda55e56817acd4930c2aa4472e53ba5
Author: Field G. Van Zee <[email protected]>
Date: Mon Oct 14 10:11:29 2013 -0500
Minor fixes to piledriver configuration, ukernel.
Details:
- Applied a patch from Tyler that fixes minor staleness in the piledriver
configuration and gemm micro-kernel.
- Very minor changes to test suite input files.
commit dacdde27aee4fb90b14880136d7f20c6b234e2c6
Author: Field G. Van Zee <[email protected]>
Date: Fri Oct 11 11:37:19 2013 -0500
Added Fran's Sandy Bridge kernels/configuration.
Details:
- Added a kernel directory for kernels developed by Francisco Igual for
the Sandy Bridge architecture, including a dgemm ukernel coded with
AVX intrinsics.
- Added a configuration for Sandy Bridge using values supplied by Fran.
commit 03106d650e4030d4c9831683448376f92fc52d41
Author: Field G. Van Zee <[email protected]>
Date: Fri Oct 11 10:40:38 2013 -0500
Fixed minor perf bug in gemm_ker_var2.
Details:
- Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental
bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not
computed correctly (ie: do not wraparound) at the edge cases. Thanks to
Tze Meng for helping me identify this bug.
commit b053337387dbdef9035be03538222670a21707ca
Author: Field G. Van Zee <[email protected]>
Date: Thu Oct 10 18:26:55 2013 -0500
Added fusing factors, MR/NR to test suite output.
Details:
- Updated the test suite driver (and modules where appropriate) so that
the level-1f fusing factors are output along with the variable dimension.
While this is not strictly necessary, since the fusing factors are output
in the initial parameter summary, it allows extra reassurance to the user
since the fusing factors appear alongside the variable dimension, which
together give a complete picture of the problem size. Similar changes were
made for outputting the register blocksizes when reporting results for the
micro-kernel test modules.
commit be4833bd91c5a58d0bfc52daaadf7ba543a77acf
Author: Field G. Van Zee <[email protected]>
Date: Thu Oct 10 14:20:06 2013 -0500
Added test suite modules for level-1f, 3 kernels.
Details:
- Added test modules in test suite for level-1f kernels and level-3
micro-kernels. (Duplication in the micro-kernels, for now, is NOT
supported by these test modules.)
- Added section override switches to test suite's input.operations file.
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
facilitate the level-1f test modules. Also added front-end for dupl
operation.
- Added obj_t-based check routines for level-1f operations, which are
called from the new front-ends mentioned above.
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
factors as a function of datatype, which is needed by their respective
test modules.
- Whitespace changes to bli_kernel.h of all existing configurations.
commit 680188d46bb15b9a1a2867638104939dc77ca2a1
Author: Field G. Van Zee <[email protected]>
Date: Thu Oct 10 13:23:37 2013 -0500
Cleaned up old test drivers.
Details:
- Minor updates to old test drivers in preparation for our participation
in ACM TOMS's replicated results initiative.
commit 3690bdd4f95769c935c410414112102cc3e108b1
Author: Field G. Van Zee <[email protected]>
Date: Thu Oct 10 11:45:33 2013 -0500
More updates to level-1f kernels for core2-sse3.
Details:
- Changed types in function signatures to match new prototypes. Meant to
include this in previous commit.
commit 661d5120cd7071f9b0c5cefc95f99f1361370ade
Author: Field G. Van Zee <[email protected]>
Date: Thu Oct 10 11:27:27 2013 -0500
Fixed outdated fusing factor macros in 1f kernels.
Details:
- Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
this out.
commit 73aa1e9f31d1b2a319c7e711ced6db3f9835c832
Author: Field G. Van Zee <[email protected]>
Date: Tue Oct 1 17:01:18 2013 -0500
Added section overrides to test suite.
Details:
- Added new lines of input to the test suite's input.operations file, which
allows the user to disable entire sections (levels) of tests. Before this
change, the user had to manually disable each operation tests's "master
switch". (This is why input.operations.0 existed: to allow a more
convenient starting point for someone who only wanted to test one or a
few operations.)
commit 5e54f46ccb76beab892d530b693e07c6bf6db7cf
Author: Field G. Van Zee <[email protected]>
Date: Mon Sep 30 12:58:18 2013 -0500
Added template implementations and other tweaks.
Details:
- Added a 'template' configuration, which contains stub implementations of the
level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
dotxaxpyf, as well as the default fusing factor (which are all equal
in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
reference variants were implemented in terms of front-end routines rather
that directly in terms of the kernels. (For example, axpy2v was implemented
as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
A is assumed to be m x b_n in both cases, and for dotxf A is actually used
as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
frame/3/gemm/ukernels and frame/3/trsm/ukernels.
commit 97aaf220a847363b4da35935eca17790c0ef71f6
Author: Field G. Van Zee <[email protected]>
Date: Tue Sep 17 10:51:36 2013 -0500
Added new kernels, configurations.
Details:
- Added various micro-kernels for the following architectures:
Intel MIC
IBM BG/Q
IBM Power7
AMD Piledriver
Loogson 3A
and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
and Xianyi Zhang for contributing these kernels.
- Added configurations corresponding to above architectures, and renamed
"clarksville" configuration to "dunnington".
commit fe979c5a114c877506a5697cdab1fc8cf2bcd303
Author: Field G. Van Zee <[email protected]>
Date: Fri Sep 13 14:31:53 2013 -0500
Removed default configuration behavior.
Details:
- Changed the configure script so that it no longer defaults to the
reference configuration. This change is being made so that the
developer has a firm awareness of which configuration is being used
to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this
suggested change.
commit da77e9614f54f92f703f01e3b9bd67a83280150c
Author: Field G. Van Zee <[email protected]>
Date: Fri Sep 13 12:00:37 2013 -0500
Minor improvements to static memory allocator.
Details:
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
functionality includes computing the pool size for each datatype (using
that datatype's cache blocksizes) and using the maximum to size the
actual pool array. This addresses the somewhat common pitfall whereby a
developer updates cache blocksizes in bli_kernel.h for only one datatype
(say, single-precision real), while the memory pools are sized using the
double-precision real values. Then, when the developer attempts to link
to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
a message saying the static memory pool was exhausted. Clearly, this
message is misleading when the pool was not sized properly to begin with.
- Removed previously disabled code in bli_kernel_macro_defs.h that was
meant to check for size consistency among the various cache blocksizes.
(Obviously the memory pool size-based solution mentioned above is better.)
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
reasonable place to put these constants, rather than further crowd up
bli_config.h.
- Updated testsuite driver to output memory pool sizes for A, B, and C.
- Minor comment updates to bli_config.h.
- Removed 'flame' configuration. It was beginning to get out-of-date, and
I hadn't used it in months. We can always re-create it later.
commit 631f347b7a99cb02757c534fd3ec5f723a2fdb0e
Author: Field G. Van Zee <[email protected]>
Date: Tue Sep 10 17:17:28 2013 -0500
Added ESSL and Accelerate targets to test drivers.
Details:
- Added ESSL and Accelerate (OS X) targets to standalone test drivers'
Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
/ providing this patch.
commit 7ae4d7a41d13ef5f1ceee217c000a5cf77a11128
Author: Field G. Van Zee <[email protected]>
Date: Tue Sep 10 16:35:12 2013 -0500
Various changes to treatment of integers.
Details:
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
assigned values of 32, 64, or some other value. The former two result in
defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
causes integers to be defined in terms of a default type (e.g. long int).
- Updated bli_config.h in reference and clarksville configurations according
to above changes.
- Updated test drivers in test and testsuite to avoid type warnings associated
with format specifiers not matching the types of their arguments to printf()
and scanf().
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
inclusion in d141f9eeb6d1).
- Added explicit typecasting of dim_t and inc_t to macros in
bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
- Slight changes to CREDITS and INSTALL files.
- Slight tweaks to Windows build system, mostly in the form of switching to
Windows-style CRLF newlines for certain files.
commit 068437736b41d51a1f5ec47839f059bf58a20413
Author: Field G. Van Zee <[email protected]>
Date: Mon Sep 9 14:07:58 2013 -0500
Fixed set-but-not-used compiler (gcc) warnings.
Details:
- Used void-casts of certain variables to appease gcc (and perhaps other
compilers) when such variables are only used in the complex instances of
the functions. Special thanks to Karl Rupp for suggesting a portable fix
for these warnings.
commit 6dc85f63dcd5282340c9e00d585e97d70a21edc3
Author: Field G. Van Zee <[email protected]>
Date: Mon Sep 9 13:48:52 2013 -0500
Small fix to Windows defs.mk makefile fragment.
Details:
- Commented out a !include statement that was attempting to include a
version file that does not yet exist. For now, the version string is
hard-coded into defs.mk.
commit d141f9eeb6d1de7044b7429adf52d11c6fca620c
Author: Field G. Van Zee <[email protected]>
Date: Mon Sep 9 13:09:16 2013 -0500
Added Windows build system.
Details:
- Added a 'windows' directory, which contains a Windows build system
similar to that of libflame's. Thanks to Martin for getting this up
and running.
- Spun off system header #includes into bli_system.h, which is included
in blis.h
- Added a Windows section to bli_clock.c (similar to libflame's).
commit 9b320e7406fb69e8b61a0085abe2ed89a96bdb68
Author: Field G. Van Zee <[email protected]>
Date: Mon Sep 9 11:04:46 2013 -0500
Edited bli_?lamch.c to avoid Windows keyword.
Details:
- Renamed "small" variable to "smnum" to avoid collision with Windows type
by the same name. This change is needed in advance of the upcoming Windows
build system.
commit 9013ad6ff2e9ace35e0cf44c32795c2f3d5be628
Author: Field G. Van Zee <[email protected]>
Date: Wed Sep 4 13:36:07 2013 -0500
Switched integer typedefs (again) to C types.
Details:
- Redefined gint_t and guint_t in terms of the standard C types long int
and unsigned long int, respectively.
- Changed testsuite default max problem size to 500.
- Changed testsuite input.operations to use square problems for level-3
operation tests.
commit 981a60cfa07abac2e93697dfe12b0f076ab00a38
Author: Field G. Van Zee <[email protected]>
Date: Wed Sep 4 12:09:11 2013 -0500
Falling back to 32-bit integers for dim_t, etc.
Details:
- In light of recent segfaulting issues when compiling on 32-bit systems,
I've changed the default typedef for gint_t and guint_t from int64_t and
uint64_t to int32_t and uint32_t, respectively.
- Disabled 64-bit integers in the blas2blis layer for the reference
configuration.
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
to introductory output of the testsuite.
commit b776ddcd4338b34f172ef78da0ac1d771a771ab4
Author: Field G. Van Zee <[email protected]>
Date: Tue Sep 3 21:58:07 2013 -0500
Applied temp fix to typecasting bug in testsuite.
Details:
- Applied a temporary fix to the typecasting bug in the testsuite driver.
The fix involves casting both numerator and denominator to unsigned long.
This fix is more voodoo than science, as I can't be sure why it even
works.
commit 9ee6e125373869c4213c017ce772c38ecefba103
Author: Field G. Van Zee <[email protected]>
Date: Tue Sep 3 21:53:27 2013 -0500
Changed dimension spec for gemm in testsuite.
Details:
- Encounted a bizarre typecasting bug whereby the test suite was not
computing the proper dimension from the problem size and dimension
specification when the latter was set to -3. Will investigate.
Thanks to Fran for finding this "bug".
commit e8be081e68c385ab44d0fea8dade21d40c200b79
Author: Field G. Van Zee <[email protected]>
Date: Wed Aug 28 15:52:34 2013 -0500
Generalized matlab and file output in testsuite.
Details:
- Added a new option in input.general that allows outputting in
matlab/octave format so that one can output in matlab format
independently from outputting to files.
- Adjusted input.operations according to above.
- Added input.operations.0 and input.operations.1 with all options
disabled and enabled, respectively.
commit d352c746e5683037d41b5061dfb5ce08e1d0843b
Author: Field G. Van Zee <[email protected]>
Date: Tue Aug 27 13:41:46 2013 -0500
Added single/real gemm micro-kernel for x86_64.
Details:
- Added a single-precision real gemm micro-kernel in
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
- Adjusted the single-precision real register blocksizes in
config/clarksville/bli_kernel.h to be 8x4.
- Added a missing comment to bli_packm_blk_var2.c that was present in
bli_packm_blk_var3.c
commit dedda523dc5dc779ecc34e6a03dc74cb8eb220de
Author: Field G. Van Zee <[email protected]>
Date: Mon Aug 19 12:07:41 2013 -0500
Fixed bug in bli_acquire_mpart_t2b(), _l2r().
Details:
- Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r()
that cause incorrect partitioning when SUBPART0 was requested. This
bug was introduced in 46d3d09d49ad. Thanks to Bryan for isolating
this bug.
- Removed dupl kernels from kernels/x86_64/3 directory.
- Uncommented beta == 0 optimizaition code in
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
commit 12dbd2f33455e9384fe2070cbdd660fd4a7fceb5
Author: Field G. Van Zee <[email protected]>
Date: Thu Aug 8 14:39:35 2013 -0500
Moved init_safe(), finalize_safe() to BLAS compat.
Details:
- Moved the bli_init_safe() and bli_finalize_safe() function calls from the
BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto-
initializers in the BLIS layer wasn't buying us anything because the user
could still call the library with uninitialized global scalar constants,
for example. Thus, we will just have to live with the constraint that
bli_init() MUST be called before calling ANY routine with a bli_ prefix.
- Added the missing _init_safe() and finalize_safe() calls to the level-1
BLAS compatibility wrappers.
commit 8abfe55f2ae5d89df18e1b26a5a28d94b0936683
Author: Field G. Van Zee <[email protected]>
Date: Thu Aug 8 13:30:19 2013 -0500
Miscellaneous updates.
Details:
- Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
BLIS_CACHE_LINE_SIZE (typically 64).
- Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
kernels.
- Reformulated gemm_ker_var2 to look more like the other level-3 macro-
kernels, in that the interior and edge-case handling is expressed once
inside the loops in the n and m dimensions, rather than the edge-case
handling being "unrolled" and expressed as distinct code regions. The
previous macro-kernel now lives in retired form in the subdirectory
other/bli_gemm_ker_var2.c.old.
- Updated experimental gemm_ker_var5 according to above change.
- Fixed bug in bli_her2k.c whereby incorrect transformations were being
applied to optimize the macro-kernel accesses pattern on C when C is
row-stored.
- Various updates inside of test/exec_sizes.
commit 1aa05736ff49e7cc5f121acf615460fe9a87852c
Author: Field G. Van Zee <[email protected]>
Date: Wed Aug 7 12:27:04 2013 -0500
Fixed bug in interface of bla_ger_check().
Details:
- Fixed the misplaced lda parameter in the function signature of
bla_ger_check(). Thanks to Tyler for finding this bug.
commit 685aad25353fb200de4ca97a8bc0feeebde51d0f
Author: Field G. Van Zee <[email protected]>
Date: Tue Aug 6 12:25:51 2013 -0500
Fixed cpp guard typos in frame/compat/check files.
Details:
- Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been
BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this.
- Fixed various syntax errors in the code that had yet to be compiled
due to the aforementioned bug.
commit f4ec28e723d28d998f1038f82da6986e44320ef6
Author: Field G. Van Zee <[email protected]>
Date: Thu Aug 1 11:24:23 2013 -0500
Added basic OpenMP-based gemm and packm files.
Details:
- Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2
into the following auxiliary files
frame/1m/packm/other/bli_packm_blk_var2.c
frame/3/gemm/other/bli_gemm_ker_var2.c
The routine in the first file uses a basic OpenMP parallel region to
parallelize the packing of blocks of A and panels of B, while the
second uses a similar parallel region to parallelize along the n
dimension of the gemm macro-kernel.
commit f8980edf9c318453bb1962ac4939c06bf11e6d5e
Merge: 67a8b94 6e7e452
Author: Field G. Van Zee <[email protected]>
Date: Fri Jul 26 11:14:27 2013 -0500
Merge branch 'master' of https://code.google.com/p/blis
commit 67a8b9498d13b038deb316ac163e62c5b17da2ec
Author: Field G. Van Zee <[email protected]>
Date: Fri Jul 26 11:12:37 2013 -0500
Added missing cpp kernel blocksize constraints.
Details:
- Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce
constraints on the register blocksizes relative to the cache blocksizes.
Thanks to Tyler for helping me stumble across this issue.
commit 6e7e452343014e8f86640874dc1dbadca4a642a1
Author: Field G. Van Zee <[email protected]>
Date: Mon Jul 22 14:50:57 2013 -0500
Fixed minor warnings and misc issues.
Details:
- Fixed various warnings output by gcc 4.6.3-1, including removing some
set-but-not-used variables and addressing some instances of typecasting
of pointer types to integer types of different sizes.
commit 03f6c3599743bc837a7d40eb5b415b1bf4f2a4e9
Author: Field G. Van Zee <[email protected]>
Date: Mon Jul 22 12:54:32 2013 -0500
Tightened some macros that detect datatypes.
Details:
- Modified the definitions of some macros, such as bli_is_real(), so that
the "special" bit is taken into account so that BLIS_INT is differentiated
from BLIS_FLOAT.
- Whitespace changes to bli_obj_macro_defs.h.
- Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
being used.
commit b33e2f4443b9043b554963320280ff7783773652
Author: Field G. Van Zee <[email protected]>
Date: Fri Jul 19 17:15:03 2013 -0500
CHANGELOG update (for 0.0.9).
commit 0680916fdd532f7a4716b11a2515243b2c08d00f (tag: 0.0.9)
Author: Field G. Van Zee <[email protected]>
Date: Thu Jul 18 18:04:34 2013 -0500
Added BLAS error checking to compatibility layer.
Details:
- Added frame/compat/check directory, which now houses companion _check()
routines for each of the BLAS wrappers in frame/compat. These _check()
routines are called from the compatibility wrappers and mimic the
error-checking present in the netlib BLAS.
- Edited bla_xerbla.c so that xerbla() translates the operation string to
uppercase before printing.
- Redefined util routines in frame/compat/f2c/util in terms of level0
macros.
- Added prototypes for util routines, f2c routines, lsame(), and xerbla().
- Commented out prototypes in test/test_*.c since Fortran integers are now
int64_t by default (and the prototypes that were present in the files
used int).
- Removed redundant #include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c,
since blis.h was already being included.
- Other minor changes to code in frame/compat/f2c.
commit 4e80ad28c97273db3366428ec44020da7944964d
Author: Field G. Van Zee <[email protected]>
Date: Thu Jul 18 17:53:31 2013 -0500
Added support for C99 complex types/arithmetic.
Details:
- Added support for C99 complex types to bli_type_defs.h and overloaded
complex arithmetic to the scalar-level macros in include/level0. This
includes a somewhat substantial reorganization and re-layering of much
of the existing machinery present in the level0 macros.
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
commented-out by default, which optionally enables the use of built-in
C99 complex types and arithmetic.
- Minor changes to clarksville and reference configs' make_defs.mk files.
- Removed macro definitions from bli_param_macro_defs.h which was not being
used (bli_proj_dt_to_real_if_imag_eq0).
commit 6072d7c848e837ba20d607f7b727438ada31bdcf
Author: Field G. Van Zee <[email protected]>
Date: Wed Jul 17 12:27:45 2013 -0500
Fixed bugs in trsm, trmm macro-kernels.
Details:
- Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling.
- Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was
incorrectly being adjusted upward by MR, instead of NR. The rl and ru
trmm macro-kernels were updated in a similar fashion.
- Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on
diagoffb when recomputing k to skip a zero region below where the
diagonal intersects the right side of the block. The corresponding
trmm macro-kernel was also updated.
- Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR)
needed to be placed AFTER the block that recomputes k to skip the zero
region (if present). The other three trsm macro-kernels, as well as the
trmm macro-kernels, were updated in the same manner, for consistency.
- Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was
being updated to skip a zero region to the left of where the diagonal
of A intersects the top edge of the block.
- Comment updates to all trsm and trmm macro-kernels.
- Comment updates to bli_packm_init.c.
commit 47410a48f9b91e94ce4c67633686ffd1f2ad0275
Author: Field G. Van Zee <[email protected]>
Date: Wed Jul 10 14:53:59 2013 -0500
Added f2c'ed Givens rotation wrappers.
Details:
- Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic'
along with other wrappers for which no BLIS implementation exists.
- Added f2c-generated codes for applicable datatype flavors of rot, rotg,
rotm, and rotmg operations.
commit e5f90f3a8dbe671104bcb9d8b4e3409de01805da
Author: Field G. Van Zee <[email protected]>
Date: Wed Jul 10 13:40:12 2013 -0500
Removed copynz defs from bli_kernel.h files.
Details:
- Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each
configuration. (Meant to include this in previous commit.)
commit aec12d90f596e8c04b1ad178258a1cd38108f59d
Author: Field G. Van Zee <[email protected]>
Date: Wed Jul 10 13:33:30 2013 -0500
Removed copynzv, copynzm and related codes.
Details:
- Removed copynzv and copynzm operation directories. These operations
implemented a variation of copyv/m that, in the case of real source
and complex destination operands, leaves the imaginary component
untouched (rather than setting it to zero). I realize now that the
special case(s) (e.g. gemm with real A and B but complex C) that I
thought required this operation actually can be handled more simply.
- Removed level0 scalar macros implementing copynzs, copynzjs.
commit b0a0a0f274a761788531b5d281cc3b411b7124ed
Author: Field G. Van Zee <[email protected]>
Date: Tue Jul 9 17:15:38 2013 -0500
Added handling of restrict, stdint.h for non-C99.
Details:
- Removed the #include <stdint.h> from blis.h and inserted a cpp macro block
in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise
manually typedefs the types we need (which, for now, are unconditionally
int64_t and uint64_t).
- Moved basic typedefs to top of bli_type_defs.h, and comment changes.
- Added cpp macro block to bli_macro_defs.h that #defines restrict as
nothing for C++ and non-C99.
commit 4b7e7970f1af4a1ab121e07657e2b78b9fcd7671
Author: Field G. Van Zee <[email protected]>
Date: Mon Jul 8 15:20:34 2013 -0500
Migrated integer usage to stdint.h types.
Details:
- Changed the way bli_type_defs.h defines integer types so that dim_t,
inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
integer) or guint_t (general unsigned integer).
- Renamed Fortran types fchar and fint to f77_char and f77_int.
- Define f77_int as int64_t if a new configuration variable,
BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
These types are defined in stdint.h, which is now included in blis.h.
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
in terms of scomplex.
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
of char.
- Updated bla_amax() wrappers so that the return type is defined directly
as f77_int, rather than letting the prototype-generating macro decide
the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
so I removed them. Also, changed the body of the wrapper so that a
gint_t is passed into abmaxv, which is THEN typecast to an f77_int
before returning the value.
- Updated f2c code that accessed .r and .i fields of complex and
doublecomplex types so that they use .real and .imag instead (now that
we are using scomplex and dcomplex).
commit 372501398564fdba3d5a3db86c30bc1039b185ff
Author: Field G. Van Zee <[email protected]>
Date: Mon Jul 8 11:24:18 2013 -0500
Added experimental bli_gemm_ker_var5().
Details:
- Added support for an experimental gemm macro-kernel incrementally
packs one micro-panel of B at a time. This is useful for certain
special cases of gemm where m is small.
- Minor changes to default values of clarksville configuration.
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
do not yet have any use (or implementation support) for block storage.
- Comment update to bli_packm_init.c.
commit 9915d667a79f23e3a2a2516247c560e9063a1646
Author: Field G. Van Zee <[email protected]>
Date: Sun Jul 7 13:28:39 2013 -0500
Defined "total" blocksize query functions.
Details:
- Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query
the default blocksize plus blocksize extension (using the type or the type
of an object).
- Comment update in bli_packm_cxk.c.
commit 46d3d09d49aded1d9f1b468c83fce75e07d631dc
Author: Field G. Van Zee <[email protected]>
Date: Thu Jun 27 13:19:56 2013 -0500
Consolidated lower/upper her[2]k blocked variants.
Details:
- Consolidated lower and upper blocked variants for herk and her2k, and
renamed the resulting variants, according to the same changes recently
made to trmm and trsm.
- Implemented support for four new subpartitions types:
BLIS_SUBPART1T
BLIS_SUBPART1B
BLIS_SUBPART1L
BLIS_SUBPART1R
which correspond to "merged" partitions that include the middle "1"
partition as well as either the neighboring "0" or "2" partition. This is
used to clean up code in herk/her2k var2 that attempts to partition away
the strictly zero region above or below the diagonal of a matrix operand
that is being marched through diagonally.
- Added safeguards to herk macro-kernels that skip any leading or trailing
zero region in the panel of C that is passed in. This is now needed given
that herk/her2k var1 no longer partitions off this zero region before
calling the macro-kernel (via bli_her[2]k_int()).
- Updated comments and other whitespace changes to trmm/trsm macro-kernels.
commit 02002ef6f3d2746665982793db36714bd69bccc9
Author: Field G. Van Zee <[email protected]>
Date: Mon Jun 24 17:08:14 2013 -0500
Added row-storage optimizations for trmm, trsm.
Details:
- Implemented algorithmic optimizations for trmm and trsm whereby the right
side case is now handled explicitly, rather than induced indirectly by
transposing and swapping strides on operands. This allows us to walk through
the output matrix with favorable access patterns no matter how it is stored,
for all parameter combinations.
- Renamed trmm and trsm blocked variants so that there is no longer a
lower/upper distinction. Instead, we simply label the variants by which
dimension is partitioned and whether the variant marches forwards or
backwards through the corresponding partitioned operands.
- Added support for row-stored packing of lower and upper triangular matrices
(as provided by bli_packm_blk_var3.c).
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
blocksize extensions (if non-zero) were not being used to appropriately size
the first iteration (ie: the bottom/right edge case).
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
whole multiples of MR AND NR. This is needed for the case of trsm_r where,
in order to reuse existing left-side gemmtrsm fused micro-kernels, the
packing of A (left-hand operand) and B (right-hand operand) is done with
NR and MR, respectively (instead of MR and NR).
commit d1e81ddc848ee47bc188735883d14582bdd0cabc
Author: Field G. Van Zee <[email protected]>
Date: Thu Jun 13 11:14:21 2013 -0500
Minor generalizing tweaks to trmm blk var1, var2.
commit 0efb7974f104206ba3985276f2180a9b14fe9f9b
Author: Field G. Van Zee <[email protected]>
Date: Wed Jun 12 16:40:04 2013 -0500
CHANGELOG update.
commit 5b641c3bab31eac6a1795b9f6e3f86c59651ca50 (tag: 0.0.8)
Author: Field G. Van Zee <[email protected]>
Date: Wed Jun 12 16:02:12 2013 -0500
Use separate CFLAGS for "kernels" directories.
Details:
- Added a new "special" directory type: any source code within directories
named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
compiler flags. This allows the developer to specify a separate set of
flags (e.g. optimization flags) for compiling kernels while maintaining a
standard set for regular framework code.
- Fixed a bug in the top-level Makefile that was causing "noopt" code
to be compiled with the standard set of compilation flags.
- Updated make_defs.mk in reference, flame, and clarksville configurations
according to above changes.
commit 08475e7c7653ba598665071a617d10f0d8f763c2
Author: Field G. Van Zee <[email protected]>
Date: Tue Jun 11 12:18:39 2013 -0500
Various level-3 optimizations for row storage.
Details:
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
packing from a lower or upper-stored symmetric/Hermitian matrix to column
panels (which are row-stored). Previously one could only pack to row panels
(which are column-stored).
- Implemented various optimizations in the level-3 front-ends that allow more
favorable access through row-stored matrices for gemm, hemm, herk, her2k,
symm, syrk, and syr2k.
- Cleaned up code in level-3 front-ends that has to do with setting target and
execution datatypes.
commit 05a657a6b92e8d34efa5c57ae6a18a4f35ec0841
Author: Field G. Van Zee <[email protected]>
Date: Fri Jun 7 11:04:10 2013 -0500
Added beta == 0 optimization to x86_64 ukernel.
Details:
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
from memory (nor scaled by beta).
- Fixed minor bug in test suite driver when "Test all combinations of storage
schemes?" switch is disabled, which would result in redundant tests being
executed for matrix-only (e.g. level-1m, level-3) operations if multiple
vector storage schemes were specified.