-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathindex.html
More file actions
2240 lines (2144 loc) · 91.9 KB
/
index.html
File metadata and controls
2240 lines (2144 loc) · 91.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!doctype html>
<html>
<head>
<meta charset="utf-8" />
<title>FPTalks</title>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link rel="stylesheet" type="text/css" href="fpbench.css">
<style>
#info {
padding-bottom: 1.25em;
margin-bottom: 1.5em;
border-bottom: 1px solid gray;
}
details {
clear: both;
margin-bottom: 1.5em;
}
summary {
list-style-type: none;
}
time {
float: right;
}
.title {
color: #198C0E;
}
details:has(.abstract) .title {
cursor: pointer;
}
.speaker {
font-style: italic;
font-size: 90%;
}
.abstract {
font-size: 80%;
margin-bottom: 2em;
}
</style>
</head>
<body>
<header>
<a href='index.html'>
<img src='img/logo.png' height='150' alt='FPTalks Logo' />
<h1>FPTalks</h1>
</a>
<p>Numerics community meetings and resources</p>
<ul>
<li><a href="index.html">Events</a></li>
<li><a href="education.html">Education</a></li>
<li><a href="community.html">Tools</a></li>
<li><a href="https://groups.google.com/a/fpbench.org/g/fpbench">Mailing List</a></li>
<li><a href="about.html">About</a></li>
</ul>
</header>
<div class="pitch">
<a href="https://groups.google.com/a/fpbench.org/g/fpbench">Join us</a> monthly to talk numerics.
</div>
<p id='info'>
FPTalks meets for an hour, on Zoom, on the first Thursday of every month at 9am Pacific.<abbr
title="Except that we skip months for the annual workshop and for the winter holidays."><sup>†</sup></abbr>
You can <a href="https://forms.gle/H5aodraRcgcJFq6W8">submit a proposal</a> to present.
</p>
<main>
<!--
<details>
<summary>
<time>TIME</time>
<div class="title">
TITLE, use sentence not title case
</div>
<div class="speaker">
TODO, University of TODO
</div>
</summary>
<div class="abstract">
<p>
TODO
</p>
</div>
</details>
-->
<!--
<details>
<summary>
<time>Aug 6, 2026</time>
<div class="title">
<a href="talks/fptalks26.html">FPTalks 2026 annual workshop</a>
</div>
<div class="speaker">
Details coming soon
</div>
</summary>
</details>
-->
<!--
<details>
<summary>
<time>July 2, 2026</time>
<div class="title">
Semantics, Operations, and Properties of P3109 Floating-Point Representations in Lean
</div>
<div class="speaker">
Tung-Che Chang, Rutgers University
</div>
</summary>
<div class="abstract">
<p>
The upcoming IEEE-P3109 standard for low-precision
floating-point arithmetic can become the foundation of
future machine learning hardware and software. Unlike the
fixed types of IEEE-754, P3109 introduces a parametric
framework defined by bitwidth, precision, signedness, and
domain. This flexibility results in a vast combinatorial
space of formats -- some with as little as one bit of
precision -- alongside novel features such as stochastic
rounding and saturation arithmetic. These deviations create
a unique verification gap that this paper intends to
address.
</p>
<p>
This paper presents FLoPS, Formalization in Lean of the
P3109 Standard, which is a comprehensive formal model of
P3109 in Lean. Our work serves as a rigorous,
machine-checked specification that facilitates deep analysis
of the standard. We demonstrate the model's utility by
verifying foundational properties and analyzing key
algorithms within the P3109 context. Specifically, we reveal
that FastTwoSum exhibits a novel property of computing exact
"overflow error" under saturation using any rounding mode,
whereas previously established properties of the
ExtractScalar algorithm fail for formats with one bit of
precision. This work provides a verified foundation for
reasoning about P3109 and enables formal verification of
future numerical software. Our Lean development is open
source and publicly available.
</p>
<p>
More in our <a href="https://arxiv.org/abs/2602.15965">preprint</a>.
</p>
</div>
</details>
-->
<!--
<details>
<summary>
<time>June 4, 2026</time>
<div class="title">
TITLE, use sentence not title case
</div>
<div class="speaker">
Jeffrey Sarnoff, IEEE
</div>
</summary>
<div class="abstract">
<p>
TODO
</p>
</div>
</details>
-->
<details>
<summary>
<time>May 7, 2026</time>
<div class="title">
Accurate models of NVIDIA tensor cores
</div>
<div class="speaker">
Mantas Mikaitis, University of Leeds
</div>
</summary>
<div class="abstract">
<p>
Matrix multiplication is a fundamental operation in both training of
neural networks and inference. To accelerate matrix multiplication,
Graphical Processing Units (GPUs) provide it implemented in hardware.
Due to the increased throughput over the software-based matrix
multiplication, the multipliers are increasingly used outside of AI,
to accelerate various applications in scientific computing. However,
matrix multipliers targeted at AI are at present not compliant with
IEEE 754 floating-point arithmetic behaviour, with different vendors
offering different numerical features. This leads to non-reproducible
results across different generations of GPU architectures, at the
matrix multiply-accumulate instruction level.
</p>
<p>
To study numerical characteristics of matrix multipliers—such
as rounding behaviour, accumulator width, normalization points, extra
carry bits, and others—test vectors are typically constructed.
Yet, these vectors may or may not distinguish between different
hardware models, and due to limited hardware availability, their
reliability across many different platforms remains largely untested.
We present software models for emulating the inner product behavior of
low- and mixed-precision matrix multipliers in the V100, A100, H100
and B200 data center GPUs in most supported input formats of interest
to mixed-precision algorithm developers: 8-, 16-, and 19-bit floating
point. These matrix multiplier models are first approximated by
determining the numerical features via test vectors designed to trigger
outputs sensitive to bit level differences in the implementation,
followed by semi-exhaustive comparison (randomised input vectors of
10<sup>7</sup> values) between the models and the actual GPU matrix
multipliers—this process is repeated until the model is bit
accurate. These models enable verification of test vectors before
applying them to real hardware and also support computational
scientists and mixed-precision algorithm developers with easy-to-use
accurate models available in MATLAB—we demonstrate their use in
multi-word emulation algorithms for matrix multiplication.
</p>
<p>
More in our <a href="https://arxiv.org/abs/2512.07004">preprint</a>.
</p>
</div>
</details>
<details>
<summary>
<time>Apr 2, 2026</time>
<div class="title">
Cost of soundness in mixed-precision tuning
</div>
<div class="speaker">
Anastasia Isychev, TU Wien
</div>
</summary>
<div class="abstract">
<p>
Numerical code is often executed repetitively and on hardware with
limited resources, which makes it a perfect target for optimizations.
One of the most effective ways to boost performance—especially in
terms of runtime—is by reducing the precision of computations.
However, low precision can introduce significant rounding errors,
potentially compromising the correctness of results. Mixed-precision
tuning addresses this trade-off by assigning the lowest possible
precision to a subset of variables and arithmetic operations in the
program while ensuring that the overall error remains within
acceptable bounds. State-of-the-art tools validate the accuracy of
optimized programs using either sound static analysis or dynamic
sampling. While sound methods are often considered safer but overly
conservative, and dynamic methods are more aggressive and potentially
more effective, the question remains: how do these approaches compare
in practice?
</p>
<p>
We present the first comprehensive evaluation of existing
mixed-precision tuning tools for floating-point programs, offering a
<em>quantitative comparison</em> between sound static and (unsound) dynamic
approaches. We measure the trade-offs between performance gains,
utilizing optimization potential, and the soundness guarantees on the
accuracy—what we refer to as the cost of soundness. Our experiments on
the standard FPBench benchmark suite challenge the common belief that
dynamic optimizers consistently generate faster programs. In fact, for
small straight-line numerical programs, we find that sound tools
enhanced with regime inference match or outperform dynamic ones, while
providing formal correctness guarantees, albeit at the cost of
increased optimization time. Standalone sound tools, however, are
overly conservative, especially when accuracy constraints are tight;
whereas dynamic tools are consistently effective for different
targets, but exceed the maximum allowed error by up to 9 orders of
magnitude.
</p>
<p>
More in our <a href="https://aisychev.github.io/papers/oopsla25-cos.pdf">OOPSLA 2025 paper</a>.
</p>
</div>
</details>
<details>
<summary>
<time>Mar 5, 2026</time>
<div class="title">
Challenges and opportunities in large-scale linear systems
</div>
<div class="speaker">
Ichitaro Yamazaki, Sandia National Laboratories
</div>
</summary>
<div class="abstract">
<p>
Linear solvers often play paramount roles in large-sale
scientific and engineering simulations. In order to maintain
the quality of our software's functionalities and
performance over various hardwares, our solvers are
implemented within the open-source software framework,
called Trilinos. However, it is still a challenge to
maintain the robust implementation of the parallel solvers,
and we often spend significant amount of time maintaining
our solvers on each new hardware. Furthermore, in order to
effectively utilize the current large-scale computers, many
techniques have been proposed (communication
avoiding/hiding, asynchronous communication,
mixed-precision, random sketching, etc.), which often pose
further challenges to maintain the reliability of our
solvers. In this talk, we discuss some of these techniques
integrated into our solvers, and associated challenges. The
hope is to see if there are any potential collaboration in
order to enhance the reliabilities of our solver
implementations on the current and emerging computers.
</p>
<p>
<a href="https://drive.google.com/file/d/1pRxyKsbA2tp4uE2_SmRn4-u3sIfN7Red/view?usp=sharing">Slides</a>.
</p>
</div>
</details>
<details>
<summary>
<time>
Nov 16, 2025
</time>
<div class="title">
<a href="http://fpanalysistools.org/SC25/">Tutorial on Floating-point Analysis tools</a>
</div>
<div class="speaker">
A tutorial run at Supercomputing 2025
</div>
</summary>
<div class="abstract">
High performance computing and machine learning applications
increasingly rely on mixed-precision arithmetic on CPUs and
GPUs for superior performance. However, this shift introduces
several challenging numerical issues such as increased
round-off errors, and INF and NaN exceptions that can render
the computed solutions useless. At present, this places a
heavy burden on developers, interrupting their work while they
diagnose these problems manually. This tutorial presents three
tools that target specific issues leading to floating-point
bugs. First, we present FPChecker, which not only detects and
reports INF/NaN exceptions in parallel and distributed CPU
codes, but also tells programmers about the exponent value
ranges for avoiding exceptions while also minimizing rounding
errors. Second, we present GPU-FPX, which detects
floating-point exceptions generated by NVIDIA GPUs, including
their Tensor Cores via a "nixnan" extension to GPU-FPX. Third,
we present FloatGuard, a unique tool that detects exceptions
in AMD GPUs. The tutorial is aimed at helping programmers
avoid exception bugs; for this, we will demonstrate our tools
on simple examples with seeded bugs. Attendees may optionally
install and run our tools. The tutorial also allocates
question/answer time to address real situations faced by the
attendees.
</div>
</details>
<details>
<summary>
<time>Nov 6, 2025</time>
<div class="title">
Programming with first-class rounding contexts
</div>
<div class="speaker">
Brett Saiki, University of Washington and Intel
</div>
</summary>
<div class="abstract">
<p>
Precise specification of numerical algorithms is essential for
design-space exploration, simulation, and verification of possible
designs. FPCore, a Scheme-like language that makes rounding explicit and
separate from exact mathematical operations, provides one such high-level
abstraction. However, its rounding context construct cannot capture a key
feature of an emerging class of algorithms: rounding that depends on
runtime values. To express and study these designs, we propose making
rounding contexts first-class values.
</p>
<p>
We implement these ideas in FPy, a domain-specific language for exploring
numerical algorithms. FPy allows programmers to capture both existing
FPCore programs, and algorithms with value-dependent rounding that cannot
be captured in FPCore. First-class rounding contexts enable systematic
exploration, since rounding parameters may be program inputs rather than
controlled via ad-hoc metaprogramming. We demonstrate FPy’s effectiveness
by analyzing how design parameters affect numerical error on a family of
designs from prior work that features value-dependent rounding.
</p>
<p>
Please see the
<a href="https://github.com/bksaiki/fpy">repo</a>
for more info.
Try out the pip package
<a href="https://pypi.org/project/fpy2/">fpy2</a>!
</p>
</div>
</details>
<details>
<summary>
<time>Oct 2, 2025</time>
<div class="title">
RLibm-MultiRound: correct, rounding mode-agnostic libraries
</div>
<div class="speaker">
Sehyeok Park, Rutgers University
</div>
</summary>
<div class="abstract">
<p>
Our RLibm project generates a single implementation for an elementary
function that produces correctly rounded results for multiple rounding
modes and representations with up to 32-bits. The key insight is to
build polynomials that produce the correctly rounded result for a
representation with two additional bits when compared to the largest
target representation and with the "non-standard" round-to-odd rounding
mode, which makes double rounding the RLibm math library result to any
smaller target representation innocuous. The resulting approximations
are implemented with machine supported floating-point operations with
the round-to-nearest rounding mode. When an application uses any other
rounding mode, RLibm saves the application's rounding mode, changes the
system's rounding mode to round-to-nearest, computes the correctly
rounded result, and restores the application's rounding mode. This
frequent change of rounding modes has a performance cost.
</p>
<p>
In this talk I will cover two new methods to avoid the frequent changes
to the rounding mode and the dependence on round-to-nearest. First, our
new rounding-invariant outputs method emulates round-to-zero under all
rounding modes to implement RLibm's polynomial approximations. Second,
our rounding-invariant input bounds method factors any rounding error
due to different rounding modes using interval bounds in the RLibm
pipeline. Both methods make a different set of trade-offs and improve
the performance of resulting libraries by more than 2X.
</p>
<p>
Please see the
<a href="https://arxiv.org/abs/2504.07409">paper</a>
for more info.
</p>
</div>
</details>
<details>
<summary>
<time>Sep 4, 2025</time>
<div class="title">
Bean: a language for backward error analysis
</div>
<div class="speaker">
Laura Zielinski, Cornell University
</div>
</summary>
<div class="abstract">
<p>
Backward error analysis offers a method for assessing the quality of
numerical programs in the presence of floating-point rounding errors.
However, techniques from the numerical analysis literature for
quantifying backward error require substantial human effort, and there
are currently no tools or automated methods for statically deriving
sound backward error bounds. To address this gap, we propose Bean, a
typed first-order programming language designed to express quantitative
bounds on backward error. Bean's type system combines a graded coeffect
system with strict linearity to soundly track the flow of backward
error through programs. We prove the soundness of our system using a
novel categorical semantics, where every Bean program denotes a triple
of related transformations that together satisfy a backward error
guarantee.
</p>
<p>
To illustrate Bean's potential as a practical tool for automated
backward error analysis, we implement a variety of standard algorithms
from numerical linear algebra in Bean, establishing fine-grained
backward error bounds via typing in a compositional style. We also
develop a prototype implementation of Bean that infers backward error
bounds automatically. Our evaluation shows that these inferred bounds
match worst-case theoretical relative backward error bounds from the
literature, underscoring Bean's utility in validating a key property of
numerical programs: numerical stability.
</p>
<p>
Please see the
<a href="https://arxiv.org/abs/2501.14550">paper</a> and
<a href="https://github.com/Athena-Types/Bean">code</a>
for more info.
</p>
</div>
</details>
<details>
<summary>
<time>Aug 07, 2025</time>
<div class="title">
Numerical Computing with IEEE Floating Point Arithmetic
</div>
<div class="speaker">
Michael Overton, New York University
</div>
</summary>
<div class="abstract">
<p>
Although the basic principles of IEEE floating point arithmetic have
remained largely unchanged since the first edition of my book
Numerical Computing with IEEE Floating Point Arithmetic was published
by SIAM in 2001, the technology that supports it has changed
enormously. Every chapter of the book has been rewritten extensively,
and two new chapters have been added: one on computations with higher
precision than that mandated by the standard, needed for a variety of
scientific applications, and one on computations with lower precision
than was ever contemplated by those who wrote the standard, driven by
the massive computational demands of machine learning. Topics include
the rationale for floating point representation, correctly rounded
arithmetic and exception handling, support for the standard by
floating point microprocessors and programming languages, and an
introduction to the key concepts of cancellation, conditioning and
stability. The book gives many technical details that are not readily
available elsewhere. The second edition was published by SIAM in May
2025.
</p>
</div>
</details>
<details>
<summary>
<time>Aug 07, 2025</time>
<div class="title">
Detecting and diagnosing FP exceptions in GPUs and CPUs
</div>
<div class="speaker">
Mark Baranowski and Ganesh Gopalakrishnan, University of Utah
</div>
</summary>
<div class="abstract">
<p>
Hundreds of GPU programmers are being stymied by the NaNs and INFs that
arise during computation, often polluting loss functions (ML) and
residuals (HPC). The debugging problem is exacerbated due to GPU kernels
being closed-source and launched from scripts written in Python, Julia
etc. While one may build binary analysis tools to analyze exceptions,
separate tools are needed for different GPUs. Finally one likes to
detect exceptions at a higher level (e.g., LLVM): the lack of publicly
available GPU support from LLVM makes such tools more easily
CPU-targetable.
</p>
<p>
In this talk, we will briefly survey tools that can help detect and
diagnose floating-point exceptions. The bulk of this talk will be
devoted to covering the tools written at Utah: namely GPU-FPX (for GPU
SIMT cores) and its 'nixnan' variant (for GPU Tensor Cores). We run a
few demos that illustrate the ease of use of GPU-FPX on a variety of
codes: simple data compressors, simple GPTs, and Python/Julia codes.
While GPU-FPX currently helps ``X-Ray'' down the stack of kernel calls,
knowing what these kernels do and which of the detected exceptions are
relevant -- and which exception coercion rules (to normal values) are
sound -- remains unsolved. The only clear guidance we know of --
consistent exception handling due to Demmel -- does not seem to hold and
is inefficient if literally followed. Given that exceptions occur with
such high frequencies already and will multiply in their manifestations
on different hardware and software, clear guidelines for exception
coercion and blame assignment are needed.
</p>
<p>
The talk will highlight how we mined exceptions from Tensor Cores in
nixnan (Reference:
<a href="https://github.com/ganeshutah/PLDI25-Array-Workshop">github.com/ganeshutah/PLDI25-Array-Workshop</a>
), and also summarize FloatGuard (AMD Exception Checking tool from UC Davis:
HPDC'25) and FPChecker (LLVM Exception Checking from Livermore:
ISSWC'22). We will devote ~15 mins to garner audience feedback to help
us prepare for our SC'25 tutorial on this topic this November in St.
Louis, MO.
</p>
<p>
Additional input from:
Xinyi Li (Utah),
Dolores Miao (UC Davis),
Harvey Dam (Utah),
Cindy Rubio-Gonzalez (UC Davis),
and Ignacio Laguna (LLNL).
</p>
</div>
</details>
<details>
<summary>
<time>
Jul 10, 2025
</time>
<div class="title">
<a href="talks/fptalks25.html">FPTalks 2025 annual workshop</a>
</div>
<div class="speaker">
12 speakers at the cutting edge of numerics research
</div>
</summary>
</details>
<details>
<summary>
<time>May 1, 2025</time>
<div class="title">
Fast branch-free extended-precision floating-point arithmetic
</div>
<div class="speaker">
David K. Zhang, Standford University
</div>
</summary>
<div class="abstract">
<p>
Many scientific and mathematical problems demand extremely precise calculations
exceeding the limits of the double precision (IEEE binary64) floating-point format.
However, existing methods for extended-precision computation involve complex
branching algorithms that perform poorly on modern data-parallel processors,
including SIMD CPUs and GPUs. In this talk, I introduce a class of algorithms
called floating-point accumulation networks (FPANs) that enable fast branch-free
extended-precision arithmetic. FPAN-based algorithms outperform standard multiprecision
libraries by orders of magnitude, achieving up to 11.7x the peak performance of QD,
34.4x over CAMPARY, 35.6x over MPFR, and 41.4x over FLINT. I also introduce a new
formal verification technique that leverages automatic theorem provers to rigorously
establish the correctness of these new algorithms.
</p>
</div>
</details>
<details>
<summary>
<time>Apr 3, 2025</time>
<div class="title">
Fast sound error bounds for mixed-precision real evaluation
</div>
<div class="speaker">
Artem Yadrov, University of Utah
</div>
</summary>
<div class="abstract">
<p>
Evaluating real-valued expressions to high precision is a key
building block in computational mathematics, physics, and numerics.
A typical implementation uses a uniform precision for each
operation, and doubles that precision until the real result can be
bounded to some sufficiently narrow interval. However, this is
wasteful: usually only a few operations really need to be performed
at high precision, and the bulk of the expression could use much
lower precision. Uniform precision can also waste iterations
discovering the necessary precision and then still overestimate by
up to a factor of two. We propose to instead use mixed-precision
interval arithmetic to evaluate real-valued expressions. A key
challenge is deriving the mixed-precision assignment both soundly
and quickly. To do so, we introduce a sound variation of error
Taylor series and condition numbers, specialized to interval
arithmetic, that can be evaluated with minimal overhead thanks to
an “exponent trick”. Our implementation, Reval, achieves an
average speed-up of 1.47× compared to the state-of-the-art Sollya
tool, with the speed-up increasing to 4.92× on the most difficult
input points. An examination of the precisions used with and without
precision tuning shows that the speed-up results come from quickly
assigning lower precisions for the majority of operations
</p>
</div>
</details>
<details>
<summary>
<time>Mar 6, 2025</time>
<div class="title">
Floating-Point Neural Network Verification
</div>
<div class="speaker">
Edoardo Manino, The University of Manchester
</div>
</summary>
<div class="abstract">
<p>
Safety-critical systems with neural network components require strong
guarantees. While existing verification techniques have shown great
progress towards this goal, they mostly reason on real-valued
abstractions of neural networks. As soon as we consider their
floating-point behaviour, the associated verification problem becomes
harder. In this talk, we discuss a software verification approach to
this problem. In doing so, we introduce NeuroCodeBench, a benchmark
of neural network code for software verification. With it, we show
the advantages and shortcomings of reasoning on neural networks at
the floating-point level.
</p>
</div>
</details>
<details>
<summary>
<time>
Feb 6, 2025
</time>
<div class="title">
LP meets PL: efficient linear programming using a geometric lens
</div>
<div class="speaker">
Mridul Aanjaneya, Rutgers University
</div>
</summary>
<div class="abstract">
<p>
Linear programs (LP) arise in many domains, such as robotics,
databases, machine learning, computer graphics, etc. LP solvers are
also widely used in the PL community for analyzing vulnerabilities in C
source code, designing correctly rounded Math libraries, repairing deep
neural networks, Presburger arithmetic for polyhedral compilation, etc.
Linear programs with billions of constraints are beyond the reach of
modern LP solvers. However, for many practical applications, the linear
program is "low-dimensional", i.e., it has many constraints but only a
few unknown variables. In this talk, we'll first discuss how random
sampling can be used for provably solving (in terms of expected
iterations) feasible low-dimensional linear programs with billions of
constraints. Next, we'll discuss how this approach requires
enhancements for infeasible linear programs, where a small number of
conflicting constraints make the linear program have no solution. We'll
show that a geometric perspective can be used for efficiently
eliminating the conflicting constraints and computing a solution that
satisfies the maximum number of constraints. We'll present applications
of our LP solver for generating correctly rounded Math libraries for
floating-point variants. Time permitting, we'll also discuss
applications in interpretable machine learning.
</p>
</div>
</details>
<details>
<summary>
<time>
Dec 5, 2024
</time>
<div class="title">
Aster: sound mixed fixed-point quantizer for neural networks
</div>
<div class="speaker">
Debasmita Lohar, Karlsruhe Institute of Technology
</div>
</summary>
<div class="abstract">
<p>
Neural networks are increasingly becoming integral to safety-critical
applications, such as controllers in embedded systems. While formal
safety verification focuses on idealized real-valued networks, practical
applications require quantization to finite precision, inevitably
introducing roundoff errors. Manually optimizing precision, especially
for fixed-point implementation, while ensuring safety is complex and
time-consuming.
</p>
<p>
In this talk, I will introduce Aster, the sound, fully automated,
mixed-precision, fixed-point quantizer for deep feed-forward neural
networks. Aster reduces the quantization problem to a mixed-integer
linear programming (MILP) problem, thus efficiently determining minimal
precision to guarantee predefined error bounds. Our evaluations show
that Aster's optimized code reduces machine cycles when compiled to an
FPGA with a commercial HLS compiler compared to (sound)
state-of-the-art. Furthermore, Aster handles significantly more
benchmarks faster, especially for larger networks with thousands of
parameters.
</p>
</div>
</details>
<details>
<summary>
<time>
Nov 17, 2024
</time>
<div class="title">
<a href="http://fpanalysistools.org/SC24/">Tutorial on Floating-point Analysis tools</a>
</div>
<div class="speaker">
A tutorial run at Supercomputing 2024
</div>
</summary>
<div class="abstract">
Floating-point arithmetic is central to HPC and ML, with the
variety of number formats, hardware platforms, and compilers
exploding in this era of heterogeneity. This unfortunately
increases the incidence of numerical issues including
exceptions such as Infinity and NaN that can render the
computed results unreliable or change control-flows,
introduces excessive rounding that breaks the assumptions made
in the numerical algorithm in use, and overall causes result
non-reproducibility when code is optimized or ported across
platforms. In this tutorial, we present three novel tools: (1)
GPU-FPX, which exposes silent exceptions in NVIDIA GPU
computations, (2) Ciel, which pinpoints where compilers
silently over-optimize and cause non-reproducibility, and (3)
Herbie, which improves the accuracy of a programmer-written
expression, significantly reducing rounding error or
eliminating exceptions. This half-day tutorial will consist of
(1) presentations of floating-point basics, (2) demos of all
our numerical debugging tools, presenting their principle of
operation and ideal usage contexts, and (3) plenty of time for
Q/A, especially on using these tools within the organization
of the attendees. New and emerging technologies such as Tensor
Cores will be introduced by showing how to test for
non-portability of codes across them.
</div>
</details>
<details>
<summary>
<time>Nov 7, 2024</time>
<div class="title">
Exploring FPGA designs for MX and beyond
</div>
<div class="speaker">
Ebby Samson, Imperial College London
</div>
</summary>
<div class="abstract">
<p>
A number of companies recently worked together to release the new Open
Compute Project MX standard for low-precision computation, aimed at
efficient neural network implementation. In our work, we describe and
evaluate the first open-source FPGA implementation of the arithmetic
defined in the standard. Our designs fully support all the standard's
concrete formats for conversion into and out of MX formats and for the
standard-defined arithmetic operations, as well as arbitrary
fixed-point and floating-point formats. Certain elements of the
standard are left as implementation-defined, and we present the first
concrete FPGA-inspired choices for these elements. Our library of
optimized hardware components is available open source, alongside our
open-source Pytorch library for quantization into the new standard,
integrated with the Brevitas library so that the community can develop
novel neural network designs quantized with MX formats in mind. Our
testing shows that MX is very effective for formats such as INT5 or FP6
which are not natively supported on GPUs. This gives FPGAs an advantage
as they have the flexibility to implement a custom datapath and take
advantage of the smaller area footprints offered by these formats.
</p>
</div>
</details>
<details>
<summary>
<time>Oct 3, 2024</time>
<div class="title">
Geometric predicates for unconditional elastodynamics simulation
</div>
<div class="speaker">
Daniele Panozzo, Courant Institute of Mathematical Sciences in New York University
</div>
</summary>
<div class="abstract">
<p>
The numerical solution of partial differential equations (PDE) is
ubiquitously used for physical simulation in scientific computing and
engineering. Ideally, a PDE solver should be opaque: the user provides
as input the domain boundary, boundary conditions, and the governing
equations, and the code returns an evaluator that can compute the
value of the solution at any point of the input domain. This is
surprisingly far from being the case for all existing open-source or
commercial software, despite the research efforts in this direction
and the large academic and industrial interest. To a large extent,
this is due to lack of robustness in geometric algorithms used to
create the discretization, detect collisions, and evaluate element
validity.
</p>
<p>
I will present the incremental potential contact simulation paradigm,
which provides strong robustness guarantees in simulation codes,
ensuring, for the first time, validity of the trajectories accounting
for floating point rounding errors over an entire elastodynamic
simulation with contact. A core part of this approach is the use of a
conservative line-search to check for collisions between geometric
primitives and for ensuring validity of the deforming elements over
linear trajectories.
</p>
<p>
I will discuss both problems in depth, showing that SOTA approaches
favor numerical efficiency but are unfortunately not robust to
floating point rounding, leading to major failures in simulation. I
will then present an alternative approach based on judiciously using
rational and interval types to ensure provable correctness, while
keeping a running time comparable with non-conservative methods.
To conclude, I will discuss a set of applications enabled by this
approach in microscopy and biomechanics, including traction force
estimation on a live zebrafish and efficient modeling and simulation
of fibrous materials.
</p>
</div>
</details>
<details>
<summary>
<time>Sep 5, 2024</time>
<div class="title">
Designing Type Systems for Rounding Error Analysis
</div>
<div class="speaker">
Ariel Kellison, Cornell University
</div>
</summary>
<div class="abstract">
<p>
A central challenge in scientific computing is managing the tradeoff between accuracy
and performance. Scientific applications often require high precision to produce
meaningful results, but achieving this precision can reduce computational speed
and efficiency. For example, using higher precision arithmetic can minimize rounding
errors and improve the reliability of results, but it typically demands more processing
power and memory, which negatively affects performance. As a result, scientific software
developers must carefully balance the need for accuracy with the need for acceptable
performance.
</p>
<p>
In recent years, programming languages like Rust have demonstrated how carefully
designed type systems can help developers write high-performance code while ensuring
critical properties, such as memory safety. In this talk, we will present our work on
designing type systems that provide guarantees about the accuracy of floating-point
computations. By introducing types that can quantitatively represent the error bounds
of floating-point computations, we can create statically typed programming languages
that alert developers to potentially significant inaccuracies early in the development
process.
</p>
</div>
</details>
<details>
<summary>
<time>
Jul 11, 2024
</time>
<div class="title">
<a href="talks/fptalks24.html">FPTalks 2024 annual workshop</a>
</div>
<div class="speaker">
12 speakers at the cutting edge of numerics research
</div>
</summary>
</details>
<details>
<summary>
<time>
Jun 6, 2024
</time>
<div class="title">
On the precision loss in approximate homomorphic encryption
</div>
<div class="speaker">
Rachel Player, Royal Holloway University of London
</div>
</summary>
<div class="abstract">
<p>
Since its introduction at Asiacrypt 2017, the CKKS approximate homomorphic encryption
scheme has become one of the most widely used and implemented homomorphic encryption
schemes. Due to the approximate nature of the scheme, application developers using
CKKS must ensure that the evaluation output is within a tolerable error of the corresponding
plaintext computation. Choosing appropriate parameters requires a good understanding of
how the noise will grow through the computation. A strong understanding of the noise
growth is also necessary to limit the performance impact of mitigations to the attacks
on CKKS presented by Li and Micciancio (Eurocrypt 2021).
</p>
<p>
In this work we present a
comprehensive noise analysis of CKKS, that considers noise coming both from the encoding and
homomorphic operations. Our main contribution is the first average-case analysis for CKKS
noise, and we also introduce refinements to prior worst-case noise analyses. We develop
noise heuristics both for the original CKKS scheme and the RNS variant presented
at SAC 2018. We then evaluate these heuristics by comparing the predicted noise growth
with experiments in the HEAAN and FullRNS-HEAAN libraries, and by comparing with a
worst-case noise analysis as done in prior work. Our findings show mixed results: while
our new analyses lead to heuristic estimates that more closely model the observed noise
growth than prior approaches, the new heuristics sometimes slightly underestimate the
observed noise growth. This evidences the need for implementation-specific noise analyses
for CKKS, which recent work has shown to be effective for implementations of similar schemes.
This is joint work with Anamaria Costache, Benjamin R. Curtis, Erin Hales, Sean Murphy,
and Tabitha Ogilvie.
</p>
</div>
</details>
<details>
<summary>
<time>
May 2, 2024
</time>
<div class="title">
ReLU hull approximation
</div>
<div class="speaker">
Zhongkui Ma, The University of Queensland
</div>
</summary>
<div class="abstract">
<p>
Neural networks have offered distinct advantages over traditional
techniques. However, the opaque neural networks render their