fptalks.org/index.html at master · FPBench/fptalks.org · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!doctype html>
<html>

<head>
  <meta charset="utf-8" />
  <title>FPTalks</title>
  <meta name="viewport" content="width=device-width, initial-scale=1" />
  <link rel="stylesheet" type="text/css" href="fpbench.css">


  <style>
    #info {
      padding-bottom: 1.25em;
      margin-bottom: 1.5em;
      border-bottom: 1px solid gray;
    }

    details {
      clear: both;
      margin-bottom: 1.5em;
    }

    summary {
      list-style-type: none;
    }

    time {
      float: right;
    }

    .title {
      color: #198C0E;
    }

    details:has(.abstract) .title {
      cursor: pointer;
    }

    .speaker {
      font-style: italic;
      font-size: 90%;
    }

    .abstract {
      font-size: 80%;
      margin-bottom: 2em;
    }
  </style>
</head>

<body>
  <header>
    <a href='index.html'>
      <img src='img/logo.png' height='150' alt='FPTalks Logo' />
      <h1>FPTalks</h1>
    </a>
    <p>Numerics community meetings and resources</p>
    <ul>
      <li><a href="index.html">Events</a></li>
      <li><a href="education.html">Education</a></li>
      <li><a href="community.html">Tools</a></li>
      <li><a href="https://groups.google.com/a/fpbench.org/g/fpbench">Mailing List</a></li>
      <li><a href="about.html">About</a></li>
    </ul>
  </header>

  <div class="pitch">
    <a href="https://groups.google.com/a/fpbench.org/g/fpbench">Join us</a> monthly to talk numerics.
  </div>

  <p id='info'>
    FPTalks meets for an hour, on Zoom, on the first Thursday of every month at 9am Pacific.<abbr
      title="Except that we skip months for the annual workshop and for the winter holidays."><sup>&dagger;</sup></abbr>
    You can <a href="https://forms.gle/H5aodraRcgcJFq6W8">submit a proposal</a> to present.
  </p>


  <main>

    <!--
    <details>
      <summary>
        <time>TIME</time>
        <div class="title">
           TITLE, use sentence not title case
        </div>
        <div class="speaker">
          TODO, University of TODO
        </div>
      </summary>
      <div class="abstract">
        <p>
          TODO
        </p>
      </div>
    </details>
-->

    <!--
    <details>
      <summary>
        <time>Aug 6, 2026</time>
        <div class="title">
          <a href="talks/fptalks26.html">FPTalks 2026 annual workshop</a>
        </div>
        <div class="speaker">
          Details coming soon
        </div>
      </summary>
    </details>
    -->

    <!--
    <details>
      <summary>
        <time>July 2, 2026</time>
        <div class="title">
          Semantics, Operations, and Properties of P3109 Floating-Point Representations in Lean
        </div>
        <div class="speaker">
          Tung-Che Chang, Rutgers University
        </div>
      </summary>
      <div class="abstract">
        <p>
          The upcoming IEEE-P3109 standard for low-precision
          floating-point arithmetic can become the foundation of
          future machine learning hardware and software. Unlike the
          fixed types of IEEE-754, P3109 introduces a parametric
          framework defined by bitwidth, precision, signedness, and
          domain. This flexibility results in a vast combinatorial
          space of formats -- some with as little as one bit of
          precision -- alongside novel features such as stochastic
          rounding and saturation arithmetic. These deviations create
          a unique verification gap that this paper intends to
          address.
        </p>

        <p>
          This paper presents FLoPS, Formalization in Lean of the
          P3109 Standard, which is a comprehensive formal model of
          P3109 in Lean. Our work serves as a rigorous,
          machine-checked specification that facilitates deep analysis
          of the standard. We demonstrate the model's utility by
          verifying foundational properties and analyzing key
          algorithms within the P3109 context. Specifically, we reveal
          that FastTwoSum exhibits a novel property of computing exact
          "overflow error" under saturation using any rounding mode,
          whereas previously established properties of the
          ExtractScalar algorithm fail for formats with one bit of
          precision. This work provides a verified foundation for
          reasoning about P3109 and enables formal verification of
          future numerical software. Our Lean development is open
          source and publicly available.
        </p>

        <p>
          More in our <a href="https://arxiv.org/abs/2602.15965">preprint</a>.
        </p>
      </div>
    </details>
-->

    <!--
    <details>
      <summary>
        <time>June 4, 2026</time>
        <div class="title">
           TITLE, use sentence not title case
        </div>
        <div class="speaker">
          Jeffrey Sarnoff, IEEE
        </div>
      </summary>
      <div class="abstract">
        <p>
          TODO
        </p>
      </div>
    </details>
-->

    <details>
      <summary>
        <time>May 7, 2026</time>
        <div class="title">
          Accurate models of NVIDIA tensor cores
        </div>
        <div class="speaker">
          Mantas Mikaitis, University of Leeds
        </div>
      </summary>
      <div class="abstract">
        <p>
          Matrix multiplication is a fundamental operation in both training of
          neural networks and inference. To accelerate matrix multiplication,
          Graphical Processing Units (GPUs) provide it implemented in hardware.
          Due to the increased throughput over the software-based matrix
          multiplication, the multipliers are increasingly used outside of AI,
          to accelerate various applications in scientific computing. However,
          matrix multipliers targeted at AI are at present not compliant with
          IEEE 754 floating-point arithmetic behaviour, with different vendors
          offering different numerical features. This leads to non-reproducible
          results across different generations of GPU architectures, at the
          matrix multiply-accumulate instruction level.
        </p>
        <p>
          To study numerical characteristics of matrix multipliers&mdash;such
          as rounding behaviour, accumulator width, normalization points, extra
          carry bits, and others&mdash;test vectors are typically constructed.
          Yet, these vectors may or may not distinguish between different
          hardware models, and due to limited hardware availability, their
          reliability across many different platforms remains largely untested.
          We present software models for emulating the inner product behavior of
          low- and mixed-precision matrix multipliers in the V100, A100, H100
          and B200 data center GPUs in most supported input formats of interest
          to mixed-precision algorithm developers: 8-, 16-, and 19-bit floating
          point. These matrix multiplier models are first approximated by
          determining the numerical features via test vectors designed to trigger
          outputs sensitive to bit level differences in the implementation,
          followed by semi-exhaustive comparison (randomised input vectors of
          10<sup>7</sup> values) between the models and the actual GPU matrix
          multipliers&mdash;this process is repeated until the model is bit
          accurate. These models enable verification of test vectors before
          applying them to real hardware and also support computational
          scientists and mixed-precision algorithm developers with easy-to-use
          accurate models available in MATLAB&mdash;we demonstrate their use in
          multi-word emulation algorithms for matrix multiplication.
        </p>
        <p>
          More in our <a href="https://arxiv.org/abs/2512.07004">preprint</a>.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>Apr 2, 2026</time>
        <div class="title">
          Cost of soundness in mixed-precision tuning
        </div>
        <div class="speaker">
          Anastasia Isychev, TU Wien
        </div>
      </summary>
      <div class="abstract">
        <p>
          Numerical code is often executed repetitively and on hardware with
          limited resources, which makes it a perfect target for optimizations.
          One of the most effective ways to boost performance—especially in
          terms of runtime—is by reducing the precision of computations.
          However, low precision can introduce significant rounding errors,
          potentially compromising the correctness of results. Mixed-precision
          tuning addresses this trade-off by assigning the lowest possible
          precision to a subset of variables and arithmetic operations in the
          program while ensuring that the overall error remains within
          acceptable bounds. State-of-the-art tools validate the accuracy of
          optimized programs using either sound static analysis or dynamic
          sampling. While sound methods are often considered safer but overly
          conservative, and dynamic methods are more aggressive and potentially
          more effective, the question remains: how do these approaches compare
          in practice?
        </p>
        <p>
          We present the first comprehensive evaluation of existing
          mixed-precision tuning tools for floating-point programs, offering a
          <em>quantitative comparison</em> between sound static and (unsound) dynamic
          approaches. We measure the trade-offs between performance gains,
          utilizing optimization potential, and the soundness guarantees on the
          accuracy—what we refer to as the cost of soundness. Our experiments on
          the standard FPBench benchmark suite challenge the common belief that
          dynamic optimizers consistently generate faster programs. In fact, for
          small straight-line numerical programs, we find that sound tools
          enhanced with regime inference match or outperform dynamic ones, while
          providing formal correctness guarantees, albeit at the cost of
          increased optimization time. Standalone sound tools, however, are
          overly conservative, especially when accuracy constraints are tight;
          whereas dynamic tools are consistently effective for different
          targets, but exceed the maximum allowed error by up to 9 orders of
          magnitude.
        </p>
        <p>
          More in our <a href="https://aisychev.github.io/papers/oopsla25-cos.pdf">OOPSLA 2025 paper</a>.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>Mar 5, 2026</time>
        <div class="title">
           Challenges and opportunities in large-scale linear systems
        </div>
        <div class="speaker">
          Ichitaro Yamazaki, Sandia National Laboratories
        </div>
      </summary>
      <div class="abstract">
        <p>
          Linear solvers often play paramount roles in large-sale
          scientific and engineering simulations. In order to maintain
          the quality of our software's functionalities and
          performance over various hardwares, our solvers are
          implemented within the open-source software framework,
          called Trilinos. However, it is still a challenge to
          maintain the robust implementation of the parallel solvers,
          and we often spend significant amount of time maintaining
          our solvers on each new hardware. Furthermore, in order to
          effectively utilize the current large-scale computers, many
          techniques have been proposed (communication
          avoiding/hiding, asynchronous communication,
          mixed-precision, random sketching, etc.), which often pose
          further challenges to maintain the reliability of our
          solvers. In this talk, we discuss some of these techniques
          integrated into our solvers, and associated challenges. The
          hope is to see if there are any potential collaboration in
          order to enhance the reliabilities of our solver
          implementations on the current and emerging computers.
        </p>
        <p>
          <a href="https://drive.google.com/file/d/1pRxyKsbA2tp4uE2_SmRn4-u3sIfN7Red/view?usp=sharing">Slides</a>.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>
          Nov 16, 2025
        </time>
        <div class="title">
          <a href="http://fpanalysistools.org/SC25/">Tutorial on Floating-point Analysis tools</a>
        </div>
        <div class="speaker">
          A tutorial run at Supercomputing 2025
        </div>
      </summary>
      <div class="abstract">
        High performance computing and machine learning applications
        increasingly rely on mixed-precision arithmetic on CPUs and
        GPUs for superior performance. However, this shift introduces
        several challenging numerical issues such as increased
        round-off errors, and INF and NaN exceptions that can render
        the computed solutions useless. At present, this places a
        heavy burden on developers, interrupting their work while they
        diagnose these problems manually. This tutorial presents three
        tools that target specific issues leading to floating-point
        bugs. First, we present FPChecker, which not only detects and
        reports INF/NaN exceptions in parallel and distributed CPU
        codes, but also tells programmers about the exponent value
        ranges for avoiding exceptions while also minimizing rounding
        errors. Second, we present GPU-FPX, which detects
        floating-point exceptions generated by NVIDIA GPUs, including
        their Tensor Cores via a "nixnan" extension to GPU-FPX. Third,
        we present FloatGuard, a unique tool that detects exceptions
        in AMD GPUs. The tutorial is aimed at helping programmers
        avoid exception bugs; for this, we will demonstrate our tools
        on simple examples with seeded bugs. Attendees may optionally
        install and run our tools. The tutorial also allocates
        question/answer time to address real situations faced by the
        attendees.
      </div>
    </details>

    <details>
      <summary>
        <time>Nov 6, 2025</time>
        <div class="title">
          Programming with first-class rounding contexts
        </div>
        <div class="speaker">
          Brett Saiki, University of Washington and Intel
        </div>
      </summary>
      <div class="abstract">
        <p>
          Precise specification of numerical algorithms is essential for
          design-space exploration, simulation, and verification of possible
          designs. FPCore, a Scheme-like language that makes rounding explicit and
          separate from exact mathematical operations, provides one such high-level
          abstraction. However, its rounding context construct cannot capture a key
          feature of an emerging class of algorithms: rounding that depends on
          runtime values. To express and study these designs, we propose making
          rounding contexts first-class values.
        </p>
        <p>
          We implement these ideas in FPy, a domain-specific language for exploring
          numerical algorithms. FPy allows programmers to capture both existing
          FPCore programs, and algorithms with value-dependent rounding that cannot
          be captured in FPCore. First-class rounding contexts enable systematic
          exploration, since rounding parameters may be program inputs rather than
          controlled via ad-hoc metaprogramming. We demonstrate FPy’s effectiveness
          by analyzing how design parameters affect numerical error on a family of
          designs from prior work that features value-dependent rounding.
        </p>
        <p>
          Please see the
          <a href="https://github.com/bksaiki/fpy">repo</a>
          for more info.
          Try out the pip package
          <a href="https://pypi.org/project/fpy2/">fpy2</a>!
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>Oct 2, 2025</time>
        <div class="title">
          RLibm-MultiRound: correct, rounding mode-agnostic libraries
        </div>
        <div class="speaker">
          Sehyeok Park, Rutgers University
        </div>
      </summary>
      <div class="abstract">
        <p>
          Our RLibm project generates a single implementation for an elementary
          function that produces correctly rounded results for multiple rounding
          modes and representations with up to 32-bits. The key insight is to
          build polynomials that produce the correctly rounded result for a
          representation with two additional bits when compared to the largest
          target representation and with the "non-standard" round-to-odd rounding
          mode, which makes double rounding the RLibm math library result to any
          smaller target representation innocuous. The resulting approximations
          are implemented with machine supported floating-point operations with
          the round-to-nearest rounding mode. When an application uses any other
          rounding mode, RLibm saves the application's rounding mode, changes the
          system's rounding mode to round-to-nearest, computes the correctly
          rounded result, and restores the application's rounding mode. This
          frequent change of rounding modes has a performance cost.
        </p>
        <p>
          In this talk I will cover two new methods to avoid the frequent changes
          to the rounding mode and the dependence on round-to-nearest. First, our
          new rounding-invariant outputs method emulates round-to-zero under all
          rounding modes to implement RLibm's polynomial approximations. Second,
          our rounding-invariant input bounds method factors any rounding error
          due to different rounding modes using interval bounds in the RLibm
          pipeline. Both methods make a different set of trade-offs and improve
          the performance of resulting libraries by more than 2X.
        </p>
        <p>
          Please see the
          <a href="https://arxiv.org/abs/2504.07409">paper</a>
          for more info.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>Sep 4, 2025</time>
        <div class="title">
          Bean: a language for backward error analysis
        </div>
        <div class="speaker">
          Laura Zielinski, Cornell University
        </div>
      </summary>
      <div class="abstract">
        <p>
          Backward error analysis offers a method for assessing the quality of
          numerical programs in the presence of floating-point rounding errors.
          However, techniques from the numerical analysis literature for
          quantifying backward error require substantial human effort, and there
          are currently no tools or automated methods for statically deriving
          sound backward error bounds. To address this gap, we propose Bean, a
          typed first-order programming language designed to express quantitative
          bounds on backward error. Bean's type system combines a graded coeffect
          system with strict linearity to soundly track the flow of backward
          error through programs. We prove the soundness of our system using a
          novel categorical semantics, where every Bean program denotes a triple
          of related transformations that together satisfy a backward error
          guarantee.
        </p>
        <p>
          To illustrate Bean's potential as a practical tool for automated
          backward error analysis, we implement a variety of standard algorithms
          from numerical linear algebra in Bean, establishing fine-grained
          backward error bounds via typing in a compositional style. We also
          develop a prototype implementation of Bean that infers backward error
          bounds automatically. Our evaluation shows that these inferred bounds
          match worst-case theoretical relative backward error bounds from the
          literature, underscoring Bean's utility in validating a key property of
          numerical programs: numerical stability.
        </p>
        <p>
          Please see the
          <a href="https://arxiv.org/abs/2501.14550">paper</a> and
          <a href="https://github.com/Athena-Types/Bean">code</a>
          for more info.
        </p>
      </div>
    </details>


    <details>
      <summary>
        <time>Aug 07, 2025</time>
        <div class="title">
          Numerical Computing with IEEE Floating Point Arithmetic
        </div>
        <div class="speaker">
          Michael Overton, New York University
        </div>
      </summary>
      <div class="abstract">
        <p>
          Although the basic principles of IEEE floating point arithmetic have
          remained largely unchanged since the first edition of my book
          Numerical Computing with IEEE Floating Point Arithmetic was published
          by SIAM in 2001, the technology that supports it has changed
          enormously. Every chapter of the book has been rewritten extensively,
          and two new chapters have been added: one on computations with higher
          precision than that mandated by the standard, needed for a variety of
          scientific applications, and one on computations with lower precision
          than was ever contemplated by those who wrote the standard, driven by
          the massive computational demands of machine learning. Topics include
          the rationale for floating point representation, correctly rounded
          arithmetic and exception handling, support for the standard by
          floating point microprocessors and programming languages, and an
          introduction to the key concepts of cancellation, conditioning and
          stability. The book gives many technical details that are not readily
          available elsewhere. The second edition was published by SIAM in May
          2025.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>Aug 07, 2025</time>
        <div class="title">
          Detecting and diagnosing FP exceptions in GPUs and CPUs
        </div>
        <div class="speaker">
          Mark Baranowski and Ganesh Gopalakrishnan, University of Utah
        </div>
      </summary>
      <div class="abstract">
        <p>
          Hundreds of GPU programmers are being stymied by the NaNs and INFs that
          arise during computation, often polluting loss functions (ML) and
          residuals (HPC). The debugging problem is exacerbated due to GPU kernels
          being closed-source and launched from scripts written in Python, Julia
          etc. While one may build binary analysis tools to analyze exceptions,
          separate tools are needed for different GPUs. Finally one likes to
          detect exceptions at a higher level (e.g., LLVM): the lack of publicly
          available GPU support from LLVM makes such tools more easily
          CPU-targetable.
        </p>
        <p>
          In this talk, we will briefly survey tools that can help detect and
          diagnose floating-point exceptions. The bulk of this talk will be
          devoted to covering the tools written at Utah: namely GPU-FPX (for GPU
          SIMT cores) and its 'nixnan' variant (for GPU Tensor Cores). We run a
          few demos that illustrate the ease of use of GPU-FPX on a variety of
          codes: simple data compressors, simple GPTs, and Python/Julia codes.
          While GPU-FPX currently helps ``X-Ray'' down the stack of kernel calls,
          knowing what these kernels do and which of the detected exceptions are
          relevant -- and which exception coercion rules (to normal values) are
          sound -- remains unsolved. The only clear guidance we know of --
          consistent exception handling due to Demmel -- does not seem to hold and
          is inefficient if literally followed. Given that exceptions occur with
          such high frequencies already and will multiply in their manifestations
          on different hardware and software, clear guidelines for exception
          coercion and blame assignment are needed.
        </p>
        <p>
          The talk will highlight how we mined exceptions from Tensor Cores in
          nixnan (Reference:
          <a href="https://github.com/ganeshutah/PLDI25-Array-Workshop">github.com/ganeshutah/PLDI25-Array-Workshop</a>
          ), and also summarize FloatGuard (AMD Exception Checking tool from UC Davis:
          HPDC'25) and FPChecker (LLVM Exception Checking from Livermore:
          ISSWC'22). We will devote ~15 mins to garner audience feedback to help
          us prepare for our SC'25 tutorial on this topic this November in St.
          Louis, MO.
        </p>
        <p>
          Additional input from:
          Xinyi Li (Utah),
          Dolores Miao (UC Davis),
          Harvey Dam (Utah),
          Cindy Rubio-Gonzalez (UC Davis),
          and Ignacio Laguna (LLNL).
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>
          Jul 10, 2025
        </time>
        <div class="title">
          <a href="talks/fptalks25.html">FPTalks 2025 annual workshop</a>
        </div>
        <div class="speaker">
          12 speakers at the cutting edge of numerics research
        </div>
      </summary>
    </details>


    <details>
      <summary>
        <time>May 1, 2025</time>
        <div class="title">
          Fast branch-free extended-precision floating-point arithmetic
        </div>
        <div class="speaker">
          David K. Zhang, Standford University
        </div>
      </summary>
      <div class="abstract">
        <p>
          Many scientific and mathematical problems demand extremely precise calculations
          exceeding the limits of the double precision (IEEE binary64) floating-point format.
          However, existing methods for extended-precision computation involve complex
          branching algorithms that perform poorly on modern data-parallel processors,
          including SIMD CPUs and GPUs. In this talk, I introduce a class of algorithms
          called floating-point accumulation networks (FPANs) that enable fast branch-free
          extended-precision arithmetic. FPAN-based algorithms outperform standard multiprecision
          libraries by orders of magnitude, achieving up to 11.7x the peak performance of QD,
          34.4x over CAMPARY, 35.6x over MPFR, and 41.4x over FLINT. I also introduce a new
          formal verification technique that leverages automatic theorem provers to rigorously
          establish the correctness of these new algorithms.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>Apr 3, 2025</time>
        <div class="title">
          Fast sound error bounds for mixed-precision real evaluation
        </div>
        <div class="speaker">
          Artem Yadrov, University of Utah
        </div>
      </summary>
      <div class="abstract">
        <p>
          Evaluating real-valued expressions to high precision is a key
          building block in computational mathematics, physics, and numerics.
          A typical implementation uses a uniform precision for each
          operation, and doubles that precision until the real result can be
          bounded to some sufficiently narrow interval. However, this is
          wasteful: usually only a few operations really need to be performed
          at high precision, and the bulk of the expression could use much
          lower precision. Uniform precision can also waste iterations
          discovering the necessary precision and then still overestimate by
          up to a factor of two. We propose to instead use mixed-precision
          interval arithmetic to evaluate real-valued expressions. A key
          challenge is deriving the mixed-precision assignment both soundly
          and quickly. To do so, we introduce a sound variation of error
          Taylor series and condition numbers, specialized to interval
          arithmetic, that can be evaluated with minimal overhead thanks to
          an “exponent trick”. Our implementation, Reval, achieves an
          average speed-up of 1.47× compared to the state-of-the-art Sollya
          tool, with the speed-up increasing to 4.92× on the most difficult
          input points. An examination of the precisions used with and without
          precision tuning shows that the speed-up results come from quickly
          assigning lower precisions for the majority of operations
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>Mar 6, 2025</time>
        <div class="title">
          Floating-Point Neural Network Verification
        </div>
        <div class="speaker">
          Edoardo Manino, The University of Manchester
        </div>
      </summary>
      <div class="abstract">
        <p>
          Safety-critical systems with neural network components require strong
          guarantees. While existing verification techniques have shown great
          progress towards this goal, they mostly reason on real-valued
          abstractions of neural networks. As soon as we consider their
          floating-point behaviour, the associated verification problem becomes
          harder. In this talk, we discuss a software verification approach to
          this problem. In doing so, we introduce NeuroCodeBench, a benchmark
          of neural network code for software verification. With it, we show
          the advantages and shortcomings of reasoning on neural networks at
          the floating-point level.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>
          Feb 6, 2025
        </time>
        <div class="title">
          LP meets PL: efficient linear programming using a geometric lens
        </div>
        <div class="speaker">
          Mridul Aanjaneya, Rutgers University
        </div>
      </summary>
      <div class="abstract">
        <p>
          Linear programs (LP) arise in many domains, such as robotics,
          databases, machine learning, computer graphics, etc. LP solvers are
          also widely used in the PL community for analyzing vulnerabilities in C
          source code, designing correctly rounded Math libraries, repairing deep
          neural networks, Presburger arithmetic for polyhedral compilation, etc.
          Linear programs with billions of constraints are beyond the reach of
          modern LP solvers. However, for many practical applications, the linear
          program is "low-dimensional", i.e., it has many constraints but only a
          few unknown variables. In this talk, we'll first discuss how random
          sampling can be used for provably solving (in terms of expected
          iterations) feasible low-dimensional linear programs with billions of
          constraints. Next, we'll discuss how this approach requires
          enhancements for infeasible linear programs, where a small number of
          conflicting constraints make the linear program have no solution. We'll
          show that a geometric perspective can be used for efficiently
          eliminating the conflicting constraints and computing a solution that
          satisfies the maximum number of constraints. We'll present applications
          of our LP solver for generating correctly rounded Math libraries for
          floating-point variants. Time permitting, we'll also discuss
          applications in interpretable machine learning.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>
          Dec 5, 2024
        </time>
        <div class="title">
          Aster: sound mixed fixed-point quantizer for neural networks
        </div>
        <div class="speaker">
          Debasmita Lohar, Karlsruhe Institute of Technology
        </div>
      </summary>
      <div class="abstract">
        <p>
          Neural networks are increasingly becoming integral to safety-critical
          applications, such as controllers in embedded systems. While formal
          safety verification focuses on idealized real-valued networks, practical
          applications require quantization to finite precision, inevitably
          introducing roundoff errors. Manually optimizing precision, especially
          for fixed-point implementation, while ensuring safety is complex and
          time-consuming.
        </p>
        <p>
          In this talk, I will introduce Aster, the sound, fully automated,
          mixed-precision, fixed-point quantizer for deep feed-forward neural
          networks. Aster reduces the quantization problem to a mixed-integer
          linear programming (MILP) problem, thus efficiently determining minimal
          precision to guarantee predefined error bounds. Our evaluations show
          that Aster's optimized code reduces machine cycles when compiled to an
          FPGA with a commercial HLS compiler compared to (sound)
          state-of-the-art. Furthermore, Aster handles significantly more
          benchmarks faster, especially for larger networks with thousands of
          parameters.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>
          Nov 17, 2024
        </time>
        <div class="title">
          <a href="http://fpanalysistools.org/SC24/">Tutorial on Floating-point Analysis tools</a>
        </div>
        <div class="speaker">
          A tutorial run at Supercomputing 2024
        </div>
      </summary>
      <div class="abstract">
        Floating-point arithmetic is central to HPC and ML, with the
        variety of number formats, hardware platforms, and compilers
        exploding in this era of heterogeneity. This unfortunately
        increases the incidence of numerical issues including
        exceptions such as Infinity and NaN that can render the
        computed results unreliable or change control-flows,
        introduces excessive rounding that breaks the assumptions made
        in the numerical algorithm in use, and overall causes result
        non-reproducibility when code is optimized or ported across
        platforms. In this tutorial, we present three novel tools: (1)
        GPU-FPX, which exposes silent exceptions in NVIDIA GPU
        computations, (2) Ciel, which pinpoints where compilers
        silently over-optimize and cause non-reproducibility, and (3)
        Herbie, which improves the accuracy of a programmer-written
        expression, significantly reducing rounding error or
        eliminating exceptions. This half-day tutorial will consist of
        (1) presentations of floating-point basics, (2) demos of all
        our numerical debugging tools, presenting their principle of
        operation and ideal usage contexts, and (3) plenty of time for
        Q/A, especially on using these tools within the organization
        of the attendees. New and emerging technologies such as Tensor
        Cores will be introduced by showing how to test for
        non-portability of codes across them.
      </div>
    </details>

    <details>
      <summary>
        <time>Nov 7, 2024</time>
        <div class="title">
          Exploring FPGA designs for MX and beyond
        </div>
        <div class="speaker">
          Ebby Samson, Imperial College London
        </div>
      </summary>
      <div class="abstract">
        <p>
          A number of companies recently worked together to release the new Open
          Compute Project MX standard for low-precision computation, aimed at
          efficient neural network implementation. In our work, we describe and
          evaluate the first open-source FPGA implementation of the arithmetic
          defined in the standard. Our designs fully support all the standard's
          concrete formats for conversion into and out of MX formats and for the
          standard-defined arithmetic operations, as well as arbitrary
          fixed-point and floating-point formats. Certain elements of the
          standard are left as implementation-defined, and we present the first
          concrete FPGA-inspired choices for these elements. Our library of
          optimized hardware components is available open source, alongside our
          open-source Pytorch library for quantization into the new standard,
          integrated with the Brevitas library so that the community can develop
          novel neural network designs quantized with MX formats in mind. Our
          testing shows that MX is very effective for formats such as INT5 or FP6
          which are not natively supported on GPUs. This gives FPGAs an advantage
          as they have the flexibility to implement a custom datapath and take
          advantage of the smaller area footprints offered by these formats.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>Oct 3, 2024</time>
        <div class="title">
          Geometric predicates for unconditional elastodynamics simulation
        </div>
        <div class="speaker">
          Daniele Panozzo, Courant Institute of Mathematical Sciences in New York University
        </div>
      </summary>
      <div class="abstract">
        <p>
          The numerical solution of partial differential equations (PDE) is
          ubiquitously used for physical simulation in scientific computing and
          engineering. Ideally, a PDE solver should be opaque: the user provides
          as input the domain boundary, boundary conditions, and the governing
          equations, and the code returns an evaluator that can compute the
          value of the solution at any point of the input domain. This is
          surprisingly far from being the case for all existing open-source or
          commercial software, despite the research efforts in this direction
          and the large academic and industrial interest. To a large extent,
          this is due to lack of robustness in geometric algorithms used to
          create the discretization, detect collisions, and evaluate element
          validity.
        </p>
        <p>
          I will present the incremental potential contact simulation paradigm,
          which provides strong robustness guarantees in simulation codes,
          ensuring, for the first time, validity of the trajectories accounting
          for floating point rounding errors over an entire elastodynamic
          simulation with contact. A core part of this approach is the use of a
          conservative line-search to check for collisions between geometric
          primitives and for ensuring validity of the deforming elements over
          linear trajectories.
        </p>
        <p>
          I will discuss both problems in depth, showing that SOTA approaches
          favor numerical efficiency but are unfortunately not robust to
          floating point rounding, leading to major failures in simulation. I
          will then present an alternative approach based on judiciously using
          rational and interval types to ensure provable correctness, while
          keeping a running time comparable with non-conservative methods.
          To conclude, I will discuss a set of applications enabled by this
          approach in microscopy and biomechanics, including traction force
          estimation on a live zebrafish and efficient modeling and simulation
          of fibrous materials.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>Sep 5, 2024</time>
        <div class="title">
          Designing Type Systems for Rounding Error Analysis
        </div>
        <div class="speaker">
          Ariel Kellison, Cornell University
        </div>
      </summary>
      <div class="abstract">
        <p>
          A central challenge in scientific computing is managing the tradeoff between accuracy
          and performance. Scientific applications often require high precision to produce
          meaningful results, but achieving this precision can reduce computational speed
          and efficiency. For example, using higher precision arithmetic can minimize rounding
          errors and improve the reliability of results, but it typically demands more processing
          power and memory, which negatively affects performance. As a result, scientific software
          developers must carefully balance the need for accuracy with the need for acceptable
          performance.
        </p>
        <p>
          In recent years, programming languages like Rust have demonstrated how carefully
          designed type systems can help developers write high-performance code while ensuring
          critical properties, such as memory safety. In this talk, we will present our work on
          designing type systems that provide guarantees about the accuracy of floating-point
          computations. By introducing types that can quantitatively represent the error bounds
          of floating-point computations, we can create statically typed programming languages
          that alert developers to potentially significant inaccuracies early in the development
          process.
        </p>
      </div>
    </details>


    <details>
      <summary>
        <time>
          Jul 11, 2024
        </time>
        <div class="title">
          <a href="talks/fptalks24.html">FPTalks 2024 annual workshop</a>
        </div>
        <div class="speaker">
          12 speakers at the cutting edge of numerics research
        </div>
      </summary>
    </details>

    <details>
      <summary>
        <time>
          Jun 6, 2024
        </time>
        <div class="title">
          On the precision loss in approximate homomorphic encryption
        </div>
        <div class="speaker">
          Rachel Player, Royal Holloway University of London
        </div>
      </summary>
      <div class="abstract">
        <p>
          Since its introduction at Asiacrypt 2017, the CKKS approximate homomorphic encryption
          scheme has become one of the most widely used and implemented homomorphic encryption
          schemes. Due to the approximate nature of the scheme, application developers using
          CKKS must ensure that the evaluation output is within a tolerable error of the corresponding
          plaintext computation. Choosing appropriate parameters requires a good understanding of
          how the noise will grow through the computation. A strong understanding of the noise
          growth is also necessary to limit the performance impact of mitigations to the attacks
          on CKKS presented by Li and Micciancio (Eurocrypt 2021).
        </p>
        <p>
          In this work we present a
          comprehensive noise analysis of CKKS, that considers noise coming both from the encoding and
          homomorphic operations. Our main contribution is the first average-case analysis for CKKS
          noise, and we also introduce refinements to prior worst-case noise analyses. We develop
          noise heuristics both for the original CKKS scheme and the RNS variant presented
          at SAC 2018. We then evaluate these heuristics by comparing the predicted noise growth
          with experiments in the HEAAN and FullRNS-HEAAN libraries, and by comparing with a
          worst-case noise analysis as done in prior work. Our findings show mixed results: while
          our new analyses lead to heuristic estimates that more closely model the observed noise
          growth than prior approaches, the new heuristics sometimes slightly underestimate the
          observed noise growth. This evidences the need for implementation-specific noise analyses
          for CKKS, which recent work has shown to be effective for implementations of similar schemes.
          This is joint work with Anamaria Costache, Benjamin R. Curtis, Erin Hales, Sean Murphy,
          and Tabitha Ogilvie.
        </p>
      </div>
    </details>

    <details>
      <summary>
        <time>
          May 2, 2024
        </time>
        <div class="title">
          ReLU hull approximation
        </div>
        <div class="speaker">
          Zhongkui Ma, The University of Queensland
        </div>
      </summary>
      <div class="abstract">
        <p>
          Neural networks have offered distinct advantages over traditional
          techniques. However, the opaque neural networks render their