

#### 402.06.05 Correlator Trigger Technical Overview

#### Richard Cavanaugh HL LHC CMS Detector Upgrade Director's CD-1 Review 4 April, 2018





#### Scope of Correlator Trigger

- WBS Structure
- Conceptual Design
  - Requirements and Performance
- Hardware platform (See also W.H.Smith's slides)
- R&D Programme
  - Algorithm R&D
  - Hardware R&D
  - Firmware R&D
  - Software R&D



#### Scope











































- Increased data compared with Phase-1
  - Barrel Calorimeter: 25x increase over current
  - Tracking information: new objects available
  - Endcap Calorimeter: 3D High Granularity, enables Particle Flow calorimeter reconstruction
- Increased processing compared with Phase-1
  - Match tracking info with fine grain calo info
  - Fit muon and track data together
  - More complex objects, conditions, & algorithms
  - Finer-grained PU mitigation
- Input data and algorithm processing driving design & HW choices





- Increased data compared with Phase-1
  - Barrel Calorimeter: 25x increase over current
  - Tracking information: new objects available
  - Endcap Calorimeter: 3D High Granularity, enables Particle Flow calorimeter reconstruction
- Increased processing compared with Phase-1
  - Match tracking info with fine grain calo info
  - Fit muon and track data together
  - More complex objects, conditions, & algorithms
  - Finer-grained PU mitigation
- Input data and algorithm processing driving design & HW choices



R.Cavanaugh HL-LHC CD-1 Director's Review



- Increased data compared with Phase-1
  - Barrel Calorimeter: 25x increase over current
  - Tracking information: new objects available
  - Endcap Calorimeter: 3D High Granularity, enables Particle Flow calorimeter reconstruction
- Increased processing compared with Phase-1
  - Match tracking info with fine grain calo info
  - Fit muon and track data together
  - More complex objects, conditions, & algorithms
  - Finer-grained PU mitigation
- Input data and algorithm processing driving design & HW choices





4/4/18

#### HL-LHC Trigger must deal with new challenges

- Increased data compared with Phase-1
  - Barrel Calorimeter: 25x increase over current
  - Tracking information: new objects available
  - Endcap Calorimeter: 3D High Granularity, enables Particle Flow calorimeter reconstruction
- Increased processing compared with Phase-1
  - Match tracking info with fine grain calo info
  - Fit muon and track data together
  - More complex objects, conditions, & algorithms
  - Finer-grained PU mitigation
- Input data and algorithm processing driving design & HW choices







- Increased data compared with Phase-1
  - Barrel Calorimeter: 25x increase over current
  - Tracking information: new objects available
  - Endcap Calorimeter: 3D High Granularity, enables Particle Flow calorimeter reconstruction
- Increased processing compared with Phase-1
  - Match tracking info with fine grain calo info
  - Fit muon and track data together
  - More complex objects, conditions, & algorithms
  - Finer-grained PU mitigation
- Input data and algorithm processing driving design & HW choices







- Increased data compared with Phase-1
  - Barrel Calorimeter: 25x increase over current
  - Tracking information: new objects available
  - Endcap Calorimeter: 3D High Granularity, enables Particle Flow calorimeter reconstruction
- Increased processing compared with Phase-1
  - Match tracking info with fine grain calo info
  - Fit muon and track data together
  - More complex objects, conditions, & algorithms
  - Finer-grained PU mitigation
- Input data and algorithm processing driving design & HW choices





- Increased data compared with Phase-1
  - Barrel Calorimeter: 25x increase over current
  - Tracking information: new objects available
  - Endcap Calorimeter: 3D High Granularity, enables Particle Flow calorimeter reconstruction
- Increased processing compared with Phase-1
  - Match tracking info with fine grain calo info
  - Fit muon and track data together
  - More complex objects, conditions, & algorithms
  - Finer-grained PU mitigation
- Input data and algorithm processing driving design & HW choices



CMS has good ability to discriminate between individual particles and between particle types





- Increased data compared with Phase-1
  - Barrel Calorimeter: 25x increase over current
  - Tracking information: new objects available
  - Endcap Calorimeter: 3D High Granularity, enables Particle Flow calorimeter reconstruction
- Increased processing compared with Phase-1
  - Match tracking info with fine grain calo info
  - Fit muon and track data together
  - More complex objects, conditions, & algorithms
  - Finer-grained PU mitigation
- Input data and algorithm processing driving design & HW choices



CMS has good ability to discriminate between individual particles and between particle types



4/4/18



# Schematic of HL-LHC Trigger



4/4/18

R.Cavanaugh HL-LHC CD-1 Director's Review



# Schematic of HL-LHC Trigger



4/4/18

R.Cavanaugh HL-LHC CD-1 Director's Review



- CERN-LHCC-2015-10 & CERN-LHCC-2017-13 Prototype L1 Menu inspired from Phase-1
- Desire pT thresholds to be O(20-40) GeV
- HL-LHC 140 pile-up events per beam crossing:
  - No tracking at L1: rate ≈ 1 500 kHz
- - Tracking at L1: rate ≈ 260 kHz
- HL-LHC 200 pile-up events per beam crossing
  - No tracking at L1: rate ≈ 4 000 kHz
  - Tracking at L1: rate ≈ 500 kHz
- Allow 50% margin (monitor trigs + uncertainty)
  - Max allowed design rate = 750 kHz

Main Conclusions:

R

Lepton, photon HL-LHC thresholds within O(20-40) GeV range

Hadronic algorithms need more work to get within O(20-40) GeV range

| $L = 5.6 \times 10^{34} \text{ cm}^{-2} \text{s}^{-1}, \langle PU \rangle = 140$   |  |         | L1 trigger     |              |  |
|------------------------------------------------------------------------------------|--|---------|----------------|--------------|--|
| $L = 8.0 \times 10^{34} \text{ cm}^{-2} \text{s}^{-1}, \ \langle PU \rangle = 200$ |  |         | with L1 tracks |              |  |
|                                                                                    |  | Offline |                |              |  |
| Trigger L1 tracks (pT > 2 GeV)<br>correlated with object                           |  | Rate    |                | threshold(s) |  |
| algorithm                                                                          |  | [kHz]   |                | [GeV]        |  |
| $\langle PU \rangle$                                                               |  | 140     | 200            |              |  |
| Single Mu (tk)                                                                     |  | 14      | 27             | 18           |  |
| Double Mu (tk)                                                                     |  | 1.1     | 1.2            | 14 10        |  |
| Ele* <mark>(iso tk)</mark> + Mu (tk)                                               |  | 0.7     | 0.2            | 19 10.5      |  |
| Single Ele <sup>*</sup> (tk)                                                       |  | 16      | 38             | 31           |  |
| Single iso Ele <sup>*</sup> (tk)                                                   |  | 13      | 27             | 27           |  |
| Single $\gamma^*$ (tk-iso)                                                         |  | 31      | 19             | 31           |  |
| Ele <sup>*</sup> (iso tk) + $e/\gamma^*$                                           |  | 11      | 7.3            | 22 16        |  |
| Double $\gamma^*$ (tk-iso)                                                         |  | 17      | 5              | 22 16        |  |
| Single Tau (tk)                                                                    |  | 13      | 38             | 88           |  |
| Tau <mark>(tk)</mark> + Tau                                                        |  | 32      | 55             | 56 56        |  |
| Ele* <mark>(iso tk)</mark> + Tau                                                   |  | 7.4     | 23             | 19 50        |  |
| Tau <mark>(tk)</mark> + Mu <mark>(tk)</mark>                                       |  | 5.4     | 6              | 45 14        |  |
| Single Jet                                                                         |  | 42      | 69             | 173          |  |
| Double Jet (tk)                                                                    |  | 26      | 43             | 2@136        |  |
| Quad Jet <mark>(tk)</mark>                                                         |  | 12      | 45             | 4@72         |  |
| Single ele* (tk) + Jet                                                             |  | 15      | 15             | 23 66        |  |
| Single Mu (tk) + Jet                                                               |  | 8.8     | 12             | 16 66        |  |
| Single ele <sup>*</sup> (tk) + $H_{\rm T}^{\rm miss}$ (tk)                         |  | 10      | 45             | 23 95        |  |
| Single Mu (tk) + $H_{\rm T}^{\rm miss}$ (tk)                                       |  | 2.7     | 8              | 16 95        |  |
| $H_{\rm T}$ (tk)                                                                   |  | 13      | 24             | 350          |  |
| Rate for above triggers*                                                           |  | 180     | 305            |              |  |
| Est. rate (full EG eta range)                                                      |  |         | 390            |              |  |
| Est. total L1 menu rate ( $\times$ 1.3)                                            |  | 260     | 500            |              |  |

























- 1. use tracking info
- 2. look around neutrals

R.Cavanaugh











- 1. use tracking info
- 2. look around neutrals
- 3. remove "0" neutrals
- 4. assign fractional weight to ambiguous cases







4/4/18

R.Cavanaugh HL-LHC CD-1 Director's Review





R.Cavanaugh HL-LHC CD-1 Director's Review





R.Cavanaugh HL-LHC CD-1 Director's Review





R.Cavanaugh HL-LHC CD-1 Director's Review





R.Cavanaugh HL-LHC CD-1 Director's Review



- 402.06.05.01 (Correlator L1 Trigger CORL1)
  - includes: design, engineering, and technical labor, as well as M&S to produce the electronic boards that perform particle-level event reconstruction and pileup mitigation.
    - procurement of the optical components, FPGAs, memories, and other components;
    - management and engineering support of the board production;
    - fabrication of the PCBs and assembly of the finished electronics

#### 402.06.05.03 (Correlator Trigger Inf. & Int. - CORI)

- includes all design, engineering, and technical labor to produce, monitor, and control the Correlator L1 Trigger infrastructure.
  - all labor required to design, configure, and test crates, fibres, patch panels and the DTH card that provides the DAQ and clock/control/trigger interfaces
  - all labor required to install and integrate the CORL1 system.





## **Conceptual Design**

### Design Considerations for WBS 402.06.05

- Trigger with the highest possible efficiency (target Phase-1 efficiencie).
  - leptons, photons, jets, inclusive quantities, e.g. missing transverse momentum
- Accomplish this performance within the constraints:
  - shortest possible latency
  - total trigger rate of less than 750 kHz for pileup of 200 collisions/crossing
  - process input data provided by upstream trigger primitive logic
  - provide output data meeting specification of downstream trigger logic
- The Correlator Trigger system needs to:
  - Process trigger primitive information from five separate input systems:
    - Track-finder Trigger (TFT)
    - Endcap Calorimter Trigger Primitive Generator (ECT)
    - Barrel Calorimeter Trigger (BCT) + HCAL Forward Trigger Primitive Generator (HF TPG)
    - Endcap Muon Track-finder Trigger (EMTF)
    - Barrel Muon Track-finder Trigger (BMTF)
  - Complete all calculations within assigned portion of total latency allowed
  - Provide pileup mitigated trigger data on 16 Gb/s fiber links for further Correlator Trigger processing that forms trigger objects sent to the Global Trigger

## Trigger Performance Goals

Energy resolution 70

0.2

0.6**⊢ CMS** 

Simulation

20

Anti- $k_{\tau}$ , R = 0.4

|n<sup>Ref</sup>| < 1.3

energy

100 200

Offline jet

resolution

- Calo

--- PF

1000

14

- Ultimate goal is to reach HLT and offline reconstruction performance at the L1 Trigger
  - Increasing efficiency of the reconstruction

4/4/18

Sharpening the trigger efficiency





- Efficient track reconstruction to identify and measure charged hadrons
  - HL-LHC upgrade: available at L1 for 1st time
    - Baseline:  $p_T > 2 \text{ GeV}, |\eta| < 2.4$
- Finely segmented calorimeter information, to separate charged from neutral particles
  - HL-LHC upgrade: available at L1 for 1st time
    - Barrel: crystal-level ECAL information
    - Endcaps: high-granularity calorimeter information
- Enough processing resources

# Latency budgets for the HL-LHC Trigger

- Full Correlator Trigger must complete all processing & transmit trigger objects {μ,e,γ,τ,j,MET,etc} to the GT within 2.5 μs.
- CORL1 must complete its processing of pileup mitigated candidates {μ,e,γ,h<sup>±</sup>,h<sup>0</sup>,vtx} in advance of 2.5 μs.
  Charge #1





- Design CORL1 system using existing or under-development technologies (Advanced Processor – AP)
  - FPGAs: Xilinx Ultrascale and Ultrascale+ families.
  - Optics: Samtec Firefly Modules 100Mbps to 16 Gbps.
    - Either 12 transmitters or 12 receivers per module.
    - 14.1 Gbps modules already available, 16 Gbps under development.
    - Each link allows up to 352bits/BX of data payload, assuming 16 Gbps, 64b66b encoding and 32bits/packet reserved for protocol (option → 20)
  - ATCA Advanced Telecommunications Architecture
  - Build upon Phase-1 experience with hardware, firmware, software
- Close ties between algorithm development, simulation studies, firmware and software development and design engineering to provide a hardware platform for High-Luminosity LHC physics.
  - Exploit new High Level Synthesis (HLS) tools (later slides)



### Start with a tiled multi-layer architecture where:

- Layer-1 (this WBS) performs Particle-Flow (PF) Reconstruction, Vertex Finding (VTX), Pile-Up Per Particle Identification (PUPPI) and Mitigation.
- Layer-2 (not in scope) uses Layer-1 to form the highest efficiency, highest purity trigger objects.
- Use the following Trigger Board Specifications
  - Xlinix Ultrascale+ VU9P FPGA, "-2" speed grade
    - DSP: 6840; FF: 2364k; LUT: 1182k; clk: 320 MHz
  - Xilinix C2104 Package:

4/4/18

- Max of 104 (input,output) optical links at 16 Gb/s
  - 96 (input, output) links available for data

Charge #1































R.Cavanaugh HL-LHC CD-1 Director's Review







## Input Bandwidth to the CORL1 System

#### From Interim Technical Design Report, CMS-TDR-017

Charge #1

| Input       | Object  | N bits/object | N objects | N bits/BX | Total BW (Gb/s) | Number 16 Gb/s links |  |
|-------------|---------|---------------|-----------|-----------|-----------------|----------------------|--|
| Tracker     | Track   | 100           | 900       | 90 000    | 3 600           | 1296 Prev. slide     |  |
| Barrel Calo | Cluster | 16            | 2 4 4 8   | 39 168    | 1 567           | 216                  |  |
| Barrel Calo | Tower   | 32            | 612       | 19 584    | 783             | 210                  |  |
| HF          | Tower   | 10            | 1 4 4 0   | 14 440    | 553             | 40                   |  |
| Endcap Calo | Cluster | 128           | 400       | 51 200    | 1 600           | 211                  |  |
| Endcap Calo | Tower   | 16            | 2 400     | 38 400    | 1 536           | 311                  |  |
| Barrel Muon | Track   | 64            | 36        | 2 304     | 92              | 25                   |  |
| Endcap Muon | Track   | 64            | 36        | 2 304     | 92              | 35                   |  |
| Total       |         |               |           |           | 9 819           | 1898                 |  |

- Total BW into the CORL1 system is about 9.8 Tb/s
- Split into 3 eta-divisions: endcap(-), barrel, endcap(+)
  - Barrel Calo:
    - 3 GCT boards (120° wedges) each with 72 output links, covers 9 CORL1 boards (40° wedges) = 24 (GCT) links per CORL1 board
  - Endcaps Calo ("+" and "-"):
    - 311 (EC) links / (2 x 9 phi-sectors) + 40 (HF) links / (2 x 9 φ-sectors) ≈ 18 (EC) + 3 (HF) links per CORL1 board
- Muons: 2 links per CORL1 board



#### Design Considerations for 402.06.05 Charge #1

|     | E-                 | В          | E+                 |
|-----|--------------------|------------|--------------------|
| φ-1 | 42 (TFT) +         | 42 (TFT) + | 42 (TFT) +         |
|     | 18 (EC) + 3 (HF) + | 24 (GCT) + | 18 (EC) + 3 (HF) + |
|     | 2 (EMTF)+          | 2 (BMTF)+  | 2 (EMTF)+          |
|     | 2 (VTX)            | 2 (VTX)    | 2 (VTX)            |
| φ-2 | 42 (TFT) +         | 42 (TFT) + | 42 (TFT) +         |
|     | 18 (EC) + 3 (HF) + | 24 (GCT) + | 18 (EC) + 3 (HF) + |
|     | 2 (EMTF)+          | 2 (BMTF)+  | 2 (EMTF)+          |
|     | 2 (VTX)            | 2 (VTX)    | 2 (VTX)            |
| φ-3 | 42 (TFT) +         | 42 (TFT) + | 42 (TFT) +         |
|     | 18 (EC) + 3 (HF) + | 24 (GCT) + | 18 (EC) + 3 (HF) + |
|     | 2 (EMTF)+          | 2 (BMTF)+  | 2 (EMTF)+          |
|     | 2 (VTX)            | 2 (VTX)    | 2 (VTX)            |
| φ-4 | 42 (TFT) +         | 42 (TFT) + | 42 (TFT) +         |
|     | 18 (EC) + 3 (HF) + | 24 (GCT) + | 18 (EC) + 3 (HF) + |
|     | 2 (EMTF)+          | 2 (BMTF)+  | 2 (EMTF)+          |
|     | 2 (VTX)            | 2 (VTX)    | 2 (VTX)            |
| φ-5 | 42 (TFT) +         | 42 (TFT) + | 42 (TFT) +         |
|     | 18 (EC) + 3 (HF) + | 24 (GCT) + | 18 (EC) + 3 (HF) + |
|     | 2 (EMTF)+          | 2 (BMTF)+  | 2 (EMTF)+          |
|     | 2 (VTX)            | 2 (VTX)    | 2 (VTX)            |
| φ-6 | 42 (TFT) +         | 42 (TFT) + | 42 (TFT) +         |
|     | 18 (EC) + 3 (HF) + | 24 (GCT) + | 18 (EC) + 3 (HF) + |
|     | 2 (EMTF)+          | 2 (BMTF)+  | 2 (EMTF)+          |
|     | 2 (VTX)            | 2 (VTX)    | 2 (VTX)            |
| φ-7 | 42 (TFT) +         | 42 (TFT) + | 42 (TFT) +         |
|     | 18 (EC) + 3 (HF) + | 24 (GCT) + | 18 (EC) + 3 (HF) + |
|     | 2 (EMTF)+          | 2 (BMTF)+  | 2 (EMTF)+          |
|     | 2 (VTX)            | 2 (VTX)    | 2 (VTX)            |
| φ-8 | 42 (TFT) +         | 42 (TFT) + | 42 (TFT) +         |
|     | 18 (EC) + 3 (HF) + | 24 (GCT) + | 18 (EC) + 3 (HF) + |
|     | 2 (EMTF)+          | 2 (BMTF)+  | 2 (EMTF)+          |
|     | 2 (VTX)            | 2 (VTX)    | 2 (VTX)            |
| φ-9 | 42 (TFT) +         | 42 (TFT) + | 42 (TFT) +         |
|     | 18 (EC) + 3 (HF) + | 24 (GCT) + | 18 (EC) + 3 (HF) + |
|     | 2 (EMTF)+          | 2 (BMTF)+  | 2 (EMTF)+          |
|     | 2 (VTX)            | 2 (VTX)    | 2 (VTX)            |

 27 CORL1 board for matching, PF+PUPPI processing

- Nicely fits TFT φ-sectors
- 70-67 links per CORL1 board
- Fits well within 96 input link C2104 packge for APT
- only 2 distinct algo firmware versions required (barrel, endcap)
- 2 CORL1 boards for VTX processing (TMUX=2)
  - 81 input links; 27 output links
  - Fits well within 96 input link C2104 packge for APT



### Algorithm R&D

- Ensure performance of algorithms implemented in design
- Refine requirements for design performance.
- Hardware R&D
  - ATCA technology trigger card demonstrator
  - Correlator Trigger system demonstrator: Detector TPGs → Correlator L1 Trigger → Correlator L2

#### Firmware R&D

- High Level Synthesis of trigger algorithms
- Trigger Card Infrastructure Firmware
- Software R&D
  - Control Infrastructure
  - Monitoring and Diagnostics Software

Charge #1



## Algorithm R&D using HLS Tools

- HLS is an automated design process
  - interprets algorithm specification at a high abstraction level
  - creates digital hardware/RTL code that implements that behavior.
- HLS significantly accelerates design time
  - keeps full control over the choice of architecture exploration, level of parallelism and implementation constraints.
  - reduces overall verification effort
- Using Xilinx Vivado HLS
  - Complete design environment with abundant possibilities in the form of pragma directives to fine-tune hardware generation process from High Level Language (HLL) to Hardware Description Languages (HDL)
  - Packages implementation files as an IP block for use with other tools in the Xilinx design flow.
  - C/C++ libraries contain functions and constructs optimized for implementation in an FPGA.
  - Using these libraries helps to ensure high Quality of Results (QoR)



Particle-flow (PF) + Vertex Finding (VTX) + Pileup Per Particle Identification (PUPPI)





Particle-flow (PF) + Vertex Finding (VTX) + Pileup Per Particle Identification (PUPPI)



4/4/18







































Charge #1

#### Example sub-workflow with HLS



#### Algorithm R&D: Early results using HLS " Early PF+PUPPI algorithms prototyped in firmware using Vivado High 17.7% Level Synthesis 28.7% Produces RTL, which is then 72.4% simulated on a SW test bench 22.4% regions in Scheme ш LUT [k] latency clock DSP FF [k] (pipeline) FPGA #EM,CAL,TK,MU %78.191/15 % 20,20,25,4 553 ns 320 MHz 2 4 2335 324 414 4069‰ 74.2% • 2 PF regions (= 2 IP-cores $\approx$ 40% utilization): 28.7% 2 • PF Algorithm: DSP: 2335; FF: 324k; LUT: 414k 72.4% VU9P FPGA: DSP: 6840; FF: 2364k; LUT: 1182k 22.4% Clock at 320 MHz 70.9% • (2 PF regs) x (2 pipelines/BX) = 4 det reg's ( $0.7\phi \times 0.5\eta$ ) per card 40.9% • latency = $0.553 \,\mu s$ (well within 2.5 $\mu s$ total budget) 74.2% Estimate total number of CORL1 cards for PF+PUPPI needed: • (~100 det regions) / (4 det reg's per card) $\approx$ 25 CORL1 cards Fits within 27 CORL1 cards needed to map onto TFT

4/4/18

R.Cavanaugh HL-LHC CD-1 Director's Review



## Algorithm R&D using Gen-0 Teststand

- Benefit from recent Phase 1 upgrade experience
  - Virtex-7 µTCA and ATCA cards a very capable "Gen-0" demonstrator
  - R&D: Track Finder Trigger, Calorimeter Trigger, Muon Trigger, and Correlator Trigger
- Benefit of Embedded Linux
  - Functional Linux system (network, file system, shell)
  - Xilinx Virtual Cable XVC (e.g. JTAG)
  - Debug board remotely via TCP/IP as if on bench in lab
- Benefit of Advanced eXtensible Interface (AXI) Architecture
  - Reduces learning curve & integration
  - Industry standard access to Xilinx IP
  - 95% generic infrastructure from ZYNQ hardcore and Xilinx IP, no custom HDL needed—it's all in the tools!







#### Algorithm R&D using Ultrascale+ Dev. Kit

- Very early look at VU9P FPGA
- Xilinx Development Kit includes
  - USB JTag Cable for Programming
  - Gigabit Ethernet



#### Prototype PF Algorithm implemented using HLS

inputs reads from BRAM buffers

| XOYO | Kori i | <u>k se a</u> d | kors er | 1      | K ministrativ<br>XOTO | koriti'n f | Kon (State) |      | Provine<br>XOT S | X0Y10 | X0Y11       | X0Y1Z | X0Y13 | X0Y14   |
|------|--------|-----------------|---------|--------|-----------------------|------------|-------------|------|------------------|-------|-------------|-------|-------|---------|
| X1Y0 | X1Y1   | X1YZ            | Х1ҮЗ    | X1Y4   | X1Y5                  | X1Y6       | X1Y7        | X1Y8 | X1Y9             | X1Y10 | X1Y11       | X1Y1Z | X1Y13 | X1Y14   |
| XZYO | X2Y1   | XZYZ            | хгүз    | X2Y4   | XZY5                  | XZY6       | X2Y7        |      |                  | XZY10 | XZY11       | XZY1Z | X2Y13 | XZY14   |
| X3Y0 | X3Y1   | X3YZ            | хзүз    | X3Y4   | X3Y5                  |            | ХЗҮ7        | ×    | ХЗҮ9             | хзүр  |             |       | X3Y13 | X3Y14   |
| X4Y0 | X4Y1   | X4Y2            | Х4ҮЗ    | X4Y4   | THE REAL PROPERTY IN  |            |             | X4Y8 | X4Y9             | 747   | (178)<br>V. |       |       | X4Y14   |
| X5Y0 | X5Y1   | X5Y2            | X5Y3    | X5Y4 ¥ |                       | X5Y6       | X5Y7.       |      | x5Y9. ≌<br>1 2   |       |             |       |       | X5Y14 ≌ |

Early example: 10 EM-clusters 10 HAD-clusters 10 Tracks

output captured to BRAM buffers



4/4/18

Missing Transverse Momentum

 About factor 2 (6) less rate, compared with track-based MET (CaloMET), for same trigger efficiency

#### Summed Jet Transverse Momenta

 About 15% (45%) lower trigger threshold, compared with trackbased HT (CaloHT), for same efficiency and fixed trigger rate





#### HLS4ML: High Level Synthesis for Machine Learning

 Machine learning algorithms are ubiquitous in HEP and CMS (mostly for offline or at HLT)



FPGA's structures map nicely onto ML computations

hls4ml: neural network translation library for HLS

- Supports common ML workflows and architectures
  - Keras, TensorFlow, PyTorch
  - Convolutional layers, recurrent layers
- Tunable configuration for different use cases
  - precision, reuse factors, etc

#### HLS4ML: High Level Synthesis for Machine Learning

- Motivation for fine-grained PF input to trigger objects
  - Jet substructure and tagging
- Jet substructure & object tagging at Level-1
  - 5 output multi-classifer
    - does a jet originate from a quark, gluon, W/Z boson, top quark?
  - Fully connected network
    - compressed/prunned
  - 16 inputs Javier Duarte I hls4ml
    - currently expert: jet mass, multiplicity, energy correlation functions, etc
    - investigating non-expert quantities



| Reuse = 1 | BRAM | DSP | FF  | LUT |
|-----------|------|-----|-----|-----|
| Total     | 13   | 954 | 53k | 36k |
| % Usage   | ~0%  | 17% | 3%  | 5%  |



4/4/18

## Algorithm R&D Milestones

- Q1 2018: Release of software emulator for some v.0 Correlator algorithms;
- Q2 2018: Delivery of HLS-based testbench simulator for some v.0 Correlator algos;
- Q3 2018: Est. FPGA resource usage & latency for a subset of v.0 Correlator algos.
- Q4 2018: Completion of initial hardware tests & demo of some v.0 Correlator algos;
- Q4 2018: Release of software emulator for v.1 Correlator algorithms;
- Q1 2019: Delivery of HLS-based testbench simulator for v.1 Correlator algorithms;
- Q2 2019: Est. of FPGA resource usage & latency for a specified set of v.1 Corr. algos.
- Q3 2019: Completion of hardware tests and demonstration of v.1 Correlator algos;



## Hardware R&D: Demonstrator

- Explore hardware technologies targeted for the Phase 2 upgrade
  - ATCA Form Factor including Rear Transition Module
  - MGT Link design beyond 10G line rates (16G, 25G)
  - Efficient cooling of next-gen FPGAs
  - Next generation IPMI and embedded Linux solutions
  - Advanced RAM/FPGA interconnections (U. Florida)
- General ATCA technology demonstrator, with emphasis on Trigger applications
  - Powerful performance with flexibility
  - Closely related to the ECAL Demonstrator
- Specifications:
  - Single FPGA Design, C2104 Package
  - ≥ 100 Optical Links Firefly optical modules
    - 14/16G with options to test 25G links as well.
  - Approximately 24 Links to RTM for enhanced versatili
    - RTM includes some of optical links above
  - Embedded Linux and IPMI Controller on Mezzanines
  - Deep Memory Mezzanine
- Test the full chain
  - TPGs → Correlator L1 Trigger → Correlator L2







## Hardware R&D: Links and Memory



#### Samtec Firefly Optical links

- 14 Gbs and 28 Gbs tested
- Error free TX all the way up to 28 G
- Can also be used on RTMs
- Molex Impel Connectors
  - Can handle up to 40 Gbs
- DDR4 as Large Memory Bank (tested 16 GB)
  - Low cost, low power, huge memory
  - fast, but some latency: 6-12 BX

R.Cavanaugh HL-LHC CD-1 Director's Review







- APd1 (Advanced Processor demonstrator #1):
  - APx-family card for Phase 2 Trigger: Calorimeter ,Correlator, Muon.
  - Demonstrator for a multi-purpose, customizable, common processing platform, suitable for wide-scale use in CMS back end and trigger subsystems
  - Extension of the popular and successful CTP7\*-style architecture (Linux & ZYNQ/ Virtex)architecture into ATCA on ZYNQ/Virtex Ultrascale/+
  - Customizable via high performance Rear Transition Modules (RTMs) and memory mezzanines
- Single Virtex Ultrascale+ VU9P device per board
  - XCVU9P-compatible, C2104 package
  - Optics: Samtec Firefly Modules with either 12 transmitters or 12 receivers per module (up to 16 Gbps) and 4 transmitter plus 4 receiver modules (up to 28 Gbps)
- In design now
- Specs written for:
  - Large LUT Mezzanine Interface and RTM Interface
  - Control Interfaces (ELM, IPMC, 1G/10G Ethernet)
  - Power Distribution and Internal Clock Distribution
- DTH Interface work in progress
  - CMS Central DAQ and Trigger/Timing/Control Interface Card





- Pooling of efforts in ATCA Processor hardware, firmware and software development
- Multiple ATCA processors and mezzanine board types
- Modular design philosophy, emphasis on platform solutions with flexibility and expandability
- Reusable circuit, firmware and software elements



# APd1+LUT+RTM Block Diagram





#### Hardware R&D Milestones Charge #1

- 2018 Q2 (30-June-2018): ATCA Control Infrastructure Mezzanine First SW/FW release
- 2018 Q3 (30-September-2018): APd1 Produced
- 2018 Q4 (31-December-2018): APd1 Data connectivity test
- See W.H.Smith's tall 2019 Q1 (31-March-2019): APd1 first FPGA firmware infrastructure release
- 2019 Q2 (30-June-2019): UW-IPMC rev.2 design complete
- 2019 Q3 (30-September-2019): ELM2 design complete
- 2019 Q4 (31-December-2019): Subsystem Interconnect test
  - Mock Detector TPGs → Correlator L1 Trigger → Mock Correlator L2 Trigger
- 2020 Q1 (31-March-2020): APd2 design complete
- 2020 Q2 (30-June-2020): ATCA Control Infrastructure Mezzanine Second SW/FW release
- 2020 Q3 (30-September-2020): APdx second FPGA firmware infrastructure release
- 2020 Q4 (31-December-2020): Pre-production Complete



## **Correlator Trigger Technical Summary**

- Correlator L1 Trigger meets technical performance requirements
- Designs are based on similar technologies to Phase-1
- Design uses common ATCA hardware platform and components also used by other CMS systems
- Firmware + software development evolves from Phase-1
  - Uses High Level Synthesis (HLS) tools; creates efficient FW designs linked closely to algorithm simulation
- Initial R&D program prototyping demonstrates interfaces and controls







- Weak scales, the raison d'être for the HL-LHC
  - Higgs, Flavour, Gauge Hierarchy, Supersymmetry, Dark Matter
  - O(100) GeV mass scales  $\rightarrow$  O(50) GeV endpoints  $\rightarrow$  O(40) O(20) GeV thresholds
- Important lessons from Run 1 & 2 and Higgs discovery:
  - Offline: particle flow (PF) event reconstruction, significant resolution improvement
  - High Level Trigger (HLT):
    - PF (carefully) pushed into HLT
    - Similar Offline vs HLT objects
  - Level 1 (L1):
    - Final limitation: no tracking available
    - Dissimilar HLT vs L1 objects



- Weak-scale physics  $\rightarrow$  Large statistics  $\rightarrow$  High luminosity  $\rightarrow$  Harsh environment!
  - CMS investing in providing more and better information for L1
    - Enable similar HLT vs L1 objects: better turn-on curves, better rates
- Science potential of HL-LHC determined by datasets it collects

Track-matched muons

- Without L1 Tracks
  - Misassignment of high pT to low pT muons
  - Rate flattens above O(30) GeV

- Match L1 Tracks & Muons
  - Better resolution
  - Sharper turn-on
  - Large rate reduction
  - Factor O(5-10) at 20 GeV

From CMS Technical Proposal: CERN-LHCC-2015-10





Track-matched muons





Track-matched muons



HE JNIVERSITY ( LLLINOIS AT CHICAGO

🛟 Fermilab

Track-matched muons



'HE JNIVERSITY C JLLINOIS AT CHICAGO

Track-matched muons



'HE JNIVERSITY OF JILLINOIS AT CHICAGO

🛟 Fermilab

Track-matched tau algorithms

#### Taus ۲

- Tried two (early) approaches
  - start w/ calo cluster (TkCaloTaus)
    - match to tracks
    - apply track-based isolation
  - start w/ tracks (TkEmTaus)
    - match to EM-cluster
- Either algorithm able to
  - maintain ~50 kHz rate with





🛟 Fermilab

PU=140, 14 TeV

Track-matched tau algorithms



- Tried two (early) approaches
  - start w/ calo cluster (TkCaloTaus)
    - match to tracks
    - apply track-based isolation
  - start w/ tracks (TkEmTaus)
    - match to EM-cluster



#### Either algorithm able to

- maintain ~50 kHz rate with





🛟 Fermilab

PU=140, 14 TeV

Track-matched tau algorithms



- Tried two (early) approaches
  - start w/ calo cluster (TkCaloTaus)
    - match to tracks
    - apply track-based isolation
  - start w/ tracks (TkEmTaus)
    - match to EM-cluster



#### Either algorithm able to

- maintain ~50 kHz rate with





<sup>돈</sup> 5∃ 두 5 **☆** Fermilab

Track-matched tau algorithms



- Tried two (early) approaches
  - start w/ calo cluster (TkCaloTaus)
    - match to tracks
    - apply track-based isolation
  - start w/ tracks (TkEmTaus)
    - match to EM-cluster
- Either algorithm able to maintain ~50 kHz rate with ~50% eff. for H to  $\tau\tau$  s  $\Re$  Rate reduced by facto  $\Re$ SingleTau, VBF H  $\rightarrow \tau \tau$ , < PU > = 140 CMS Simulation, Phase-2 Rate 10<sup>2</sup> From CMS Technical Prop sal CERN-LHCC-2015-1C 10 CaloTaus TkCaloTaus (CaloTaus and Tracks TkEmTaus (EM and Tracks) 0.2 0.6 0.8 0' 0.4 Eff. signal



Track-matched tau algorithms



<sup>ੂ</sup> ਸਤੋ∃ਙਤ ∰ ਟੋ Fermilab



<sup>₽</sup>5∃₽5</sup> **⊕ ≵**Fermilab



<sup>₩</sup>ĔĔĔ<sup>₩</sup> **☆** Fermilab



≓ ਙ = ਙ ਹ œ **‡** Fermilab



≓ ਙ = ਙ ਹ œ **‡** Fermilab



≓ ਙ = ਙ ਹ œ **‡** Fermilab



🛟 Fermilab

200

0.6

Eff. signa#3

150



۲

⊢⊃=<º Fermilab



∓⊃=≺⊽ Fermilab



# JHL-LHC L1 Rates vs Thresholds



















### Photons from CMS Technical Proposal

Track-matched algorithm

- Photons
  - Isolate EM-clusters from L1 tracks
    - reduces diphoton rate by factor O(5) ٠ for 20 GeV leading photon
- Challenge: tracker material
  - Photon conversions
- We know how to deal with this:
  - Apply annulus
- Example:
  - track iso of EM
  - H to  $\gamma\gamma$  signal  $\epsilon$ •



htsp://iguana.cern.ch/isp recorded 1970-Jan-01 00:13\*5









Isolate EM-clusters from L1 tracks

Track-matched algorithm

- reduces diphoton rate by factor O(5)for 20 GeV leading photon
- Challenge: tracker material
  - Photon conversions
- We know how to deal with this:
  - Apply annulus
- Example:
  - track iso of EM
  - H to  $\gamma\gamma$  signal  $\epsilon$ •

From CMS Technical Proposal: CERN-LHCC-2015-10

htsp://iguana.cern.ch/isp recorded 1970-Jan-01 00:13\*5

## Photons from CMS Technical Proposal

辈 Fermilab



40

#### From CMS Technical Proposal: CERN-LHCC-2015-10



- Challenge: tracker material
- We know how to deal with this:

Track-matched algorithm

Example:

# Photons from CMS Technical Proposal



8.8

0.85

0.9

40

#### From CMS Technical Proposal: CERN-LHCC-2015-10



Track-matched algorithm

Photons from CMS Technical Proposal

辈 Fermilab



















MET determination another poster child for Particle Flow Algorithms!



4/4/18

Missing Transverse Momentum

 About factor 2 (6) less rate, compared with track-based MET (CaloMET), for same trigger efficiency

#### Summed Jet Transverse Momenta

 About 15% (45%) lower trigger threshold, compared with trackbased HT (CaloHT), for same efficiency and fixed trigger rate



