# Development of Front-End and Data Transmission Integrated Circuits for Nuclear and HEP Experiments

# by Özgür Çobanoğlu

B.Sc. (İstanbul Üniversitesi, Physics Department) 2001M.Sc. (İstanbul Üniversitesi, Nuclear Physics Department) 2003

Adviser : Angelo Rivetti, Ph.D., VLSI Research Group, INFN of Turin Co-Adviser : Prof. Alberto Aloisio, University of Naples "Federico II", Physics Department

A thesis presented for the degree of

Doctor of Philosophy

in

Experimental Physics  $20^{th} Cycle$ 



VLSI Research Group Department of Experimental Physics University of Turin, Italy

November 2007

# Sviluppo di Circuiti Integrati di Front-End e Trasmissione Dati per Esperimenti di Fisica Nucleare e delle Alte Energie

# Özgür Çobanoğlu

B.Sc. (İstanbul Üniversitesi, Dipartimento di Fisica) 2001M.Sc. (İstanbul Üniversitesi, Dipartimento di Fisica Nucleare) 2003

Relatore : Dott. Angelo Rivetti, Gruppo di VLSI, INFN Sezione di Torino Contro Relatore : Prof. Alberto Aloisio, Dipartimento di Scienze Fisiche, Universit di Napoli "Federico II"

Tesi presentata per il conseguimento del titolo di

Dottore di Ricerca

in

Fisica Fondamentale, Applicata ed Astrofisica<br/>  $XX\ Ciclo$ 



Gruppo di VLSI Dipartimento di Fisica Sperimentale Universitá di Torino, Italia

Novembre 2007

This thesis of Özgür Çobanoğlu is approved.

Adviser : Angelo Rivetti, Ph.D., VLSI Laboratory, INFN of Turin, Italy

**Co-Adviser :** Prof. Alberto Aloisio, University of Naples "Federico II", Physics Department, Italy

## Committee in charge :

Prof. Arnaldo Longhetto

University of Turin, Department of General Physics "Amedeo Avogadro", Italy

Prof. Alberto Pasquarelli

University of Ulm, Department of Electron Devices and Circuits, Germany

Prof. Alberto Aloisio

University of Naples "Federico II", Physics Department, Italy

University of Turin, November 2007

# To

those who dedicated their lives to teach *the truth* at their times but have not been appreciated at all, who have either been forgotten gradually throughout the history or known *very well* but wrongly.

## Development of Front-End and Data Transmission Integrated Circuits for Nuclear and HEP Experiments

## by Özgür Çobanoğlu

Doctor of Philosophy in Experimental Physics University of Turin, November 2007 Adviser: Angelo Rivetti, Ph.D.

### Abstract

Experimental physics relates to developing instrumentation and interpreting the data coming out of it, usually in very large scale. From the physics phenomena down to interpretation of the results, there is a strongly connected chain of links including detector development, design of front-end (FE) and data transmission Application Specific Integrated Circuits (ASICs), communication architectures, data acquisition (DAQ) and monitoring softwares, and implementation of frameworks for off-line analysis.

Modern nuclear and high energy physics (HEP) experiments require the development of custom-designed high-density ASICs. There are two main areas of HEP in which ASICs are required: the front-end and the data transmission electronics. High integrated custom design, especially in front-end chips, is needed because of required detection precision in terms of time and spatial resolutions. The space allowed for the electronics due to required detector granularity is also a severe concern. Even though almost any kind of building blocks with high performance are available commercially, they are not optimized and compiled onto chips dense enough in accordance to HEP requirements. Thus, building an experimental system composed of commercial components only is usually very difficult. Custom design, particularly that of front-end ASICs, is needed because of the in-existence of commercially available chips which could be used for specific functionality required by experimental systems. Heavily radioactive environment leads to custom design also for data transmission ASICs, since either high-performance commercial products are not built for radiation hardness, or in case they are, their prices are prohibitively high. Using special layout techniques to make the circuits radiation tolerant is a decision made at the cost of relatively bigger dies and, thus, slightly slower operation. In this thesis, the development of two ASICs covering the above applications is presented.

An 8 channel full-custom FE-ASIC, named *the CMAD*, is designed and implemented in a commercial *350nm CMOS* technology for the binary readout of RICH-I detector system of the COMPASS experiment at CERN. The ASIC CMAD, which is successfully tested, amplifies the signals coming from fast multi-anode photo-multipliers and compares them against a threshold adjustable on-chip on a channel by channel basis. The CMAD is scheduled to be installed in COMPASS in 2008.

A charge-pump phase-locked loop (CP-PLL) based serializer for the radiationhard transceiver ASIC, namely the GBT13 which has been under development for the upgrade of the LHC, is designed and implemented in a commercial **130nm CMOS** technology. As a possible functional extension to the GBT13, a burstmode capable clock and data recovery (CDR) block is also designed with the same technology. Test prototypes for the building blocks are designed. At the time of writing, a limited number of test circuits are fabricated and tested.

# Declaration

The work in this thesis is based on research carried out at the VLSI Research Group, the Department of Experimental Physics, University and INFN of Turin, Italy. No part of this thesis has been submitted elsewhere for any other degree or qualification and it is all my own work unless referenced to the contrary in the text. The CMAD development has been carried out in collaboration with the COMPASS experiment CERN and the GBT13 development has been carried out in collaboration with the CERN MIC group.

## Copyright © 2007 by Özgür Çobanoğlu.

"The copyright of this thesis rests with the author. No quotations from it should be published without the author's prior written consent and information derived from it should be acknowledged".

# Acknowledgments

Even though it does not provide enough space for the things I actually would like to mention, the only space within this section is what I have.

I would like to thank Angelo Rivetti for his guidance during my study at the University and INFN of Turin. His *to-the-point* recommendations have always protected me from some certain pitfalls. Without his experience and will to teach me analog design together with his unique *sense of humor*, neither I would stay on the track nor the work presented within this thesis would exist.

I am grateful to Gianni Mazza for *being there* when I was in need. His consistent suggestions saving me time by keeping me going are priceless.

Thanks go to Michela Chiosso, for her hard work on the tests of the CMAD and her smiling face making things look easier then they actually are. I would like to acknowledge Paolo Delaurenti, for his past studies on the modeling of the front-end section of the CMAD, which I appreciated very much. Continuous support from Daniele Panzieri has made it possible for me to contribute to the development of the CMAD.

I am grateful to Paulo Moreira from CERN microelectronic group for various discussions, especially the ones relating to phase-locking and data recovery. I have learnt many things from his deep RF experience. I also would like to acknowledge Federico Faccio for his work on radiation hardness of VLSI technology in particular and Alessandro Marchioro for his positive energy and continuous support which gave me the opportunity to contribute to the GBT project.

I am glad to know Chungchieh (Steel) Yang from Chiao Tung University, Taiwan, for valuable discussions on CDR classifications which helped me understand the subject better. To my opinion, what builds up a person is his or her ability to appreciate the examples, good as well as bad ones, which life provides with a vast variety. I would like to name and to thank to some of those *good examples* as follows:

Sorin Martoiu for very interesting discussions particularly on analog design as well as sharing the same faith as we have spent many days and nights together in front of the layout editor even though we were working on different projects.

Sorin Cheran with his energetic attitude, which boosts people, and George Catalin Serbanut for fruitful discussions on various technical issues including also the not-so-scientific ones.

Pierre Vande Vyvre, Wisla Carena, Roberto Divia, Klaus Schossmaier, Csaba Soos and ALICE DAQ group at CERN, for teaching me, with or without awareness, the things I appreciate very much and which are too many to name.

Gökhan Unel from ATLAS for being my *elder brother* at CERN and especially for sharing the same *curiosity* and *dreams*. This is what I call *luxury* in life.

M. Nizamettin Erduran, from İstanbul Üniversitesi, Türkiye, for teaching me the fundamentals of experimental nuclear physics, for not giving up when I thought I have reached my natural limit and showing me how much *stamina* one can gain.

Osman Karaşın, as being the very first and one of the best representatives of how to be a *serious professional*.

May parents Osman Nuri Çobanoğlu, Nimet Türker and my granny Fitnat Türker, for *letting* and *supporting* me be who I am in numerous ways; both at the same time are such rare attitudes that they must be treated as *the magic couple of gifts* from the gods.

My brother Onur Çobanoğlu, for his excellent intellectual skills as a whole which formed a significant portion of my personal growth and those extremely long-lasting discussions on *fundamentals* from which I believe we both have learnt much.

Class 11-C together with all the friends within our high school A.D.M.L. located on İstanbul Bosphorus for having *grown up* a lot during a priceless period of four years. I appreciate all my friends who had the courage to leave their homes so young to *invest* on themselves and would like to thank all my teachers whose contributions can not be corresponded by words. My lady or with Angelo's own words *the real boss*, Tanya Aycan Başer, for sharing the life, with understanding and support, and standing the difficulties of being with me.

The last but not the least, I am also grateful to those who have played their roles as the bad examples, from whom I have learnt much, relating to how not to be in particular.

# Contents

|          | Abs  | stract    |                                 |  | viii |          |  |
|----------|------|-----------|---------------------------------|--|------|----------|--|
|          | Dec  | claration |                                 |  |      | x        |  |
|          | Ack  | nowle     | dgments                         |  |      | xii      |  |
| 1        | Intr | oducti    | ion                             |  |      | <b>2</b> |  |
|          | 1.1  | Front-    | End and Data Transmission ASICs |  |      | 2        |  |
|          |      | 1.1.1     | The Need for Full Custom Design |  |      | 6        |  |
|          | 1.2  | VLSI      | Design Methodology              |  |      | 6        |  |
|          |      | 1.2.1     | Design Drivers                  |  |      | 7        |  |
|          |      | 1.2.2     | Bottom-Up vs. Top-Down          |  |      | 9        |  |
|          | 1.3  | Text (    | Organization                    |  |      | 11       |  |
| <b>2</b> | The  | e COM     | IPASS Experiment at CERN        |  |      | 14       |  |
|          | 2.1  | Physic    | cs Overview                     |  |      | 15       |  |
|          |      | 2.1.1     | Muon-Beam Physics Program       |  |      | 18       |  |
|          |      | 2.1.2     | Hadronic Physics Program        |  |      | 22       |  |
|          | 2.2  | Descri    | ption of the Apparatus          |  |      | 26       |  |
|          |      | 2.2.1     | Polarized Beam                  |  |      | 28       |  |
|          |      | 2.2.2     | The Spectrometer                |  |      | 29       |  |
|          |      | 2.2.3     | RICH Detection Principle        |  |      | 33       |  |
| 3        | Des  | ign of    | CMAD                            |  |      | 38       |  |
|          | 3.1  | Archit    | cecture of the MAD-4            |  |      | 38       |  |
|          |      | 3.1.1     | Motivations for the Upgrade     |  |      | 40       |  |

|          | 3.2 | Design  | $ n of the CMAD \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots$ |
|----------|-----|---------|------------------------------------------------------------------------------------------------------------------------------------|
|          |     | 3.2.1   | Architecture                                                                                                                       |
|          |     | 3.2.2   | Transistor Level Implementation                                                                                                    |
| 4        | Tes | t of Tl | ne CMAD 74                                                                                                                         |
|          | 4.1 | Proto   | types $\ldots \ldots 74$         |
|          | 4.2 | Test S  | Setup                                                                                                                              |
|          | 4.3 | Measu   | rement Results                                                                                                                     |
| <b>5</b> | CP  | -PLL I  | Based Serializer for the GBT System 86                                                                                             |
|          | 5.1 | Introd  | luction                                                                                                                            |
|          | 5.2 | The T   | TC System 88                                                                                                                       |
|          |     | 5.2.1   | Timing                                                                                                                             |
|          |     | 5.2.2   | Trigger                                                                                                                            |
|          |     | 5.2.3   | Control                                                                                                                            |
|          |     | 5.2.4   | Line Coding in TTC System                                                                                                          |
|          | 5.3 | The L   | HC Upgrade                                                                                                                         |
|          |     | 5.3.1   | Communication Physical Layer                                                                                                       |
|          |     | 5.3.2   | Gigabit Optical Link - GOL                                                                                                         |
|          | 5.4 | Motiv   | ation for the Replacement of the Current System                                                                                    |
|          | 5.5 | GBT     | Transceiver                                                                                                                        |
|          |     | 5.5.1   | GBT Network Configurations                                                                                                         |
|          | 5.6 | PLL I   | Based Serializer Design                                                                                                            |
|          |     | 5.6.1   | PLL Architecture                                                                                                                   |
|          |     | 5.6.2   | Loop Parameter Selection                                                                                                           |
|          |     | 5.6.3   | Model Based Simulation Results                                                                                                     |
|          |     | 5.6.4   | Transistor Level Implementation                                                                                                    |
| 6        | Bur | rst-Mo  | de CDR 140                                                                                                                         |
|          | 6.1 | Introd  | luction $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $140$                                              |
|          |     | 6.1.1   | Burst-Mode Network                                                                                                                 |
|          |     | 6.1.2   | CDR Classification                                                                                                                 |
|          |     |         | December 17, 2007                                                                                                                  |

|              | 6.2          | Archit  | ecture                                                     | . 149 |
|--------------|--------------|---------|------------------------------------------------------------|-------|
|              | 6.3          | Transis | stor Level Implementation                                  | . 150 |
|              | 6.4          | Simula  | tion Results                                               | . 153 |
|              | 6.5          | Measu   | rement Results                                             | . 160 |
| 7            | Con          | clusion | 15                                                         | 164   |
|              | 7.1          | The C   | MAD                                                        | . 164 |
|              |              | 7.1.1   | Design Motivation for the Last Prototype                   | . 165 |
|              |              | 7.1.2   | Outlook                                                    | . 167 |
|              | 7.2          | The C   | P-PLL based serializer and the burst-mode CDR              | . 167 |
|              |              | 7.2.1   | Outlook                                                    | . 168 |
|              | Bibliography |         |                                                            |       |
|              | App          | oendix  |                                                            | 180   |
| $\mathbf{A}$ | Seco         | ond-Or  | der systems                                                | 180   |
|              | A.1          | Introd  | uction : Time and Frequency Domain Relationships           | . 180 |
|              |              | A.1.1   | General Second-Order Systems in the Frequency Domain       | . 181 |
|              |              | A.1.2   | General Second-Order Systems in the Time Domain            | . 183 |
|              |              | A.1.3   | Determination of Phase Margin and Crossover Frequency from |       |
|              |              |         | $\xi$ and $\omega_n$                                       | . 185 |
|              | A.2          | A Prac  | ctical Case Study : PLL Parametrization                    | . 188 |
|              |              | A.2.1   | Definitions                                                | . 189 |
|              |              | A.2.2   | Design Example                                             | . 195 |
| в            | Met          | hods f  | or Hand Calculations and Model Based Simulations           | 200   |
|              | B.1          | Model   | Cores                                                      | . 201 |
|              |              | B.1.1   | Reference Clock Generator                                  | . 201 |
|              |              | B.1.2   | Phase/Frequency Detector                                   | . 202 |
|              |              | B.1.3   | Charge-Pump                                                | 205   |
|              |              | B.1.4   | White Jittered Voltage Controlled Oscillator               | . 206 |
|              |              | B.1.5   | Probes                                                     | . 210 |
|              |              |         | December 17, 2                                             | 2007  |

| B.2 | Source | e Cores         |
|-----|--------|-----------------|
| B.3 | Script | Cores           |
|     | B.3.1  | Parametrization |
|     | B.3.2  | Evaluation      |

# Chapter 1

# Introduction

Different disciplines get together within large scale nuclear and high energy physics experiments to address a wide variety of problems. In this chapter, a brief introduction to issues connected to experimental system development is given, namely FE and data transmission ASIC designs, together with driving motivations and methodologies currently in use.

## 1.1 Front-End and Data Transmission ASICs

In high energy physics instrumentation, FEs form the interface between the detector and the read-out system. Data are transferred through read-out channels (ROC) and FEs provide the first analog interpretation of the detection. They establish functions like signal amplification and shaping, channel equalization, quantization and zero suppression. After FEs encapsulate raw data in a proper way, read-out takes over and the data, which are usually serialized, flows to the next destination through a DAQ system.

Based on the content of what is being detected, detectors and related FEs may have significantly different requirements which impose application specific architectural choices. The energy region of interest and statistical requirements defining the speed of the channel have also severe impact on architectural choices.

In a detector, each individual detection component (e.g. a pad of an array of pads located on a planar surface) is expected to produce a binary "yes/no" or an



Figure 1.1: Generic tracking principle.

analog information as result of a detection. Fig. 1.1 shows generic tracking principle. It represent a gaseous chamber for 3D tracking but actually the principle can cover most types of tracking detectors used in the field.

In Fig. 1.1, the ions created by the ionizing particle in the tracker can not recombine with their electrons due to the existence of an electric field strong enough to separate the electron from its ion. Electrons then begin to drift along the volume till they reach the positive pad plane<sup>1</sup>. The drift time of an electron is proportional to the distance from the pad plane. Ionization events are visible as peaks in the *time proportional histograms* which are created by the ADC<sup>2</sup>s that sample the related pads. Fig. 1.2 shows an example of a time proportional histogram of an individual pad. The number of counts or equivalently the amount of charge is shown on y axis

<sup>&</sup>lt;sup>1</sup>At the same time, the ions also begin to drift towards the opposite direction. However, they are not used for detection since they are significantly slower, resulting in a slower response with ambitious timing and spatial resolutions.

<sup>&</sup>lt;sup>2</sup>ADC stands for Analog-to-Digital Converter.



Figure 1.2: Time projection histogram of an individual pad.

whereas the time bin or equivalently the distance of ionization from the pad plane is shown on x axis. Therefore, the distance of these peaks from the origin can be used as one of the dimensions while constructing the event. Due to the acceleration of the electrons created during primary ionization events, secondary ionization events can take place, depending on the medium within the detector. The cones indicate these possible secondary ionization events. Ellipses on the pad plane indicate the real hit regions. The flat colored region on the pad plane is the interpreted result. The pad and row numbers are treated as (x, y) plane coordinates and with the z value, a 3D view of the event which is being processed can be rebuilt either on-line or off-line [19]. Fig. 1.3 shows an event of cosmic particles detected within such a gaseous tracker in 2D where the color codes represent the bin number holding the maximum ADC count corresponding to the distance of the highest-charge ionization [19]. Discreteness in white space is as result of a malfunction in the acquisition chain.

Having such a detector, one needs to "read" the information on the individual detection component, "written" by the "physics event" occurred. The electronic circuit that reads this information directly from the detection component is integrated within a chip called the front-end (FE).

After the front-end stage, the data should be accompanied with a proper header end/or trailer pre- and/or post-fix to be embedded into the flow of data coming from all the other parts of the entire detector system. Data transmission chips come into play at this stage. They establish the connection<sup>3</sup> between far ends (e.g. detector

 $<sup>^{3}</sup>$ Physically, this can be interpreted as frequency/phase locking and hand shaking protocols.



Figure 1.3: A cosmic event.

and counting room), guarantee the successful data transfer<sup>4</sup> and pass the data to the next stage which is usually called a Data AcQuisition (DAQ) system.

FE and data transfer ASICs impose challenges with well-known trade-off set of VLSI design<sup>5</sup> [91]. Additionally, their design becomes even harder due to harsh radiation environment of HEP experiments. As an example, enclosed device geometries, which are usually not provided by the fabrication foundry, make device modeling a more involved task, and decrease the possibility of making the first-time fabrication

<sup>&</sup>lt;sup>4</sup>Bit Error Rate (BER) is usually used to measure the success level of a digital connection.

<sup>&</sup>lt;sup>5</sup>This trade-off set is known as VLSI design octagon.

a success. Another example is higher current levels used to overcome single event upsets (SEU) imposing difficulties in optimizing operational parameters.

## 1.1.1 The Need for Full Custom Design

Experimental nuclear and HEP setups are usually *unique*, requiring very specific solutions. These specific solutions include especially high density in front-end electronics to meet spatial resolution requirements and tolerance to very high ionizing radiation levels considering both front-end and the data transmission ASICs. Additionally experimental systems have usually a very high number of read-out channels in total (e.g. of the order of tens of millions) requiring low power (e.g. of the order of a few tens of mW per channel) and small area. The number of channels integrated onto a single die can range from a few to a few hundreds depending on the signal processing needs. Unique experimental systems can easily impose non-standard signal processing functionality which can not be satisfied by the products on the market (e.g. functionality of a FE ASIC). In case commercially available high performance products exist, they are either not radiation hard (e.g. data transmission ASICs) or their prices are prohibitively high.

Considering the above issues, it can be concluded that nuclear and HEP experiments, which both can have very specific needs, require custom-designed highdensity radiation-hard ASICs for front-end and data transmission applications in particular.

## 1.2 VLSI Design Methodology

The semiconductor industry's growing ability to integrate functionality onto silicon requires that both the digital and analog circuits be increasingly integrated on the same chip. Especially communication systems must interface to the physical communications media, and those media are analog by definition. Additionally, mixed-signal design is key to overcoming the communication bottle-necks that exist in all high performance computing systems. The mixed-signal design process has changed relatively little over the past two decades, and in comparison to the digital design process, is slow, labor-intensive, and error prone. While digital design has improved the design methodology and adopted design automation, it is not the case for mixed-signal design.

## 1.2.1 Design Drivers

Design of mixed-signal systems is getting more challenging which is increasing the pressure to fix the mixed-signal productivity problem. The need to complete increasingly more involved designs more quickly, the need to increase the predictability of design process and re-usage of existing designs, together with the increasingly fluid nature of design and process requirements are the challenges which mixed-signal designers have to face. Because of the challenges listed below, new design paradigms are needed for a sustainable mixed-signal era.

#### **Time Consideration**

Large scale experimental physics consists of many parallely ongoing projects which usually share a common or relatively close deadlines. To finalize a mixed-signal chip in a timely manner, one must have a design methodology that reduces the number of design and silicon iterations, maximize designer efficiency, and make it possible to effectively use more designers to speed up the whole process by avoiding the so called *baby creation* problem. With the existing baby creation process, it takes nine months to create a new human baby. Adding more women to the process does not get the baby out any faster. To a large extent, the same is true with current mixed-signal design process. Assigning more designers can increase the design speed slightly but there are severe limitations. In the current bottom-up design process which has limited opportunities for parallelism, there are several inherently serial tasks which can not be paralleled.

#### Complexity

Obviously circuits get more complex in two different ways at the same time. First, they get larger in terms of number of transistors and level of functionality with December 17, 2007 an average growth rate of 30x per decade. Second, the operations of the circuits get more complex with a growth rate of 10x per decade. One of the examples is clearly the control systems like phase locked loops which will be presented in this thesis. The result of these two effects together is that verification complexity of such systems increases with a speed of 300x per decade. Even though CAD<sup>6</sup> tools with increasing functionality exist, verification complexity of mixed-signal systems is far ahead for them to address the needed requirements.

#### Reuse

An important part of the strategy increasing the mixed-signal productivity is reuse which has two types: intellectual property and derivatives. In both of the cases, a piece of design can not be reused by another design in a straight forward manner. Either because of the lack of a complete documentation and/or the technological process differences make it hard to do so. Generally a re-engineering is required.

#### Fluid Design and Process Requirements

Especially at the start-up phase of a design, specifications and/or technological process parameters can shift during the design process. Additionally, designs can require that they must be capable of migrating to a new application or process after completion of the initial version.

#### Gigabit I/O

In many applications, an economic judgment of whether it is better to integrate mixed-signal part of the whole design onto the same chip with the rest of the design to reduce costs *or* separating them to reduce risks is needed. Applications like wireless can make such a judgment whereas for high speed I/O, the decision can not be made: mixed-signal circuitry must reside on the same chip.

 $<sup>^6\</sup>mathrm{CAD}$  stands for Computer Aided Design.

## 1.2.2 Bottom-Up vs. Top-Down

The traditional approach to design is referred to as bottom-up design in which the design process starts with the design of individual blocks. These blocks are then combined to form the whole system. The design of the blocks starts with a set of specifications and ends with a device level implementation as seen in Fig. 1.4. After schematic capture and simulation, circuit layout is done and checked against design rules (DRC) provided by the foundry, parasitic device extraction is performed and the *extracted circuit* which is a more physical representation of what was captured as schematic is obtained. The control of extracted circuit layout versus its schematic implementation (LVS) is then performed. In case of failure at check points, backward arrows in the figure take over. Each block is verified as a standalone unit against specifications and not in the context of the overall system. Once verified individually, the blocks are then combined and verified together, and at this point the entire system is represented at the device level. Even though efficient for small designs, the bottom-up approach has the following features, which are somehow problematic for large designs:

- 1. Once the blocks are combined, simulation takes a long time and verification becomes excessively hard or sometimes impossible.
- 2. For complex designs, the greatest impact on the performance, cost and functionality is typically found at the architectural level where bottom-up design sets severe limitations for architectural exploration.
- 3. Problems arising at the combining phase are expensive to solve, since they involve re-design of the blocks.
- Bottom-up design has important and expensive steps to be performed serially. This stretches the time required to complete the design.

However a well designed top-down methodology proceeds from architecture to device-level design as seen in Fig. 1.5. First, the architecture is implemented with a high-level description either in register-transfer level or gate-level or both of them partly at the same time. Using the target library, gate-level description is generated, **December 17, 2007** 



Figure 1.4: Process diagram of bottom-up design.

then gate-level net-list is produced or schematic capture is performed as alternative to HDL code. Digital simulations follow to verify the functionality. Using standard cells, placement and routing are done to create the actual circuit layout automatically. Final logic simulations are performed to verify actual delays and circuit performance. Therefore, each level is fully designed before proceeding to the next and each level is fully leveraged in design of the next. Doing so reduces the impact of late-coming changes in the design cycle. Some basic principles for top-down design can be listed as the following:

- 1. A shared design representation is used for the entire length of the project which allows the design be simulated by all members of the design team and in which all types of descriptions, be behavioral or circuit or extracted layout, can be co-simulated.
- 2. During the design process, each change to the design is verified in the context of the entire, previously verified, design as dictated by the verification plan.
- 3. Top-down is a design process which involves multiple passes, starting from high level abstraction and refining as the details of individual blocks become available.
- 4. To the degree possible, specifications and plans should be manifested as exe-



Figure 1.5: Process diagram of standard-cell top-down design.

cutable models and scripts, things which are used in the design process on a daily basis, rather than as written documents.

The methodologies presented briefly so far, namely bottom-up and top-down approaches, are not supposed to be the only two applicables to real-world production systems. Depending on what is being designed and according to what type of an existing system one has, one or the other can be chosen where applicable. In the context of this thesis, both approaches were used.

## **1.3** Text Organization

The organization of this thesis is as follows:

**Chapter 2** gives an overview of COMPASS experiment at CERN. Brief physics program, detector description and RICH operation principle are summarized.

**Chapter 3** introduces the architecture and the circuit designs of the CMAD front-end ASIC which is realized for the RICH-I detector system of the COMPASS experiment.

**Chapter 4** deals with the test of CMAD. Test environment is described and measurement results are presented in this chapter.

Chapter 5 is where the CP-PLL based serializer designed for GBT transceiver system is introduced. Overview of the system, network configurations, operational conditions and practical implementation issues like loop parameter selection and model based simulations are detailed. The device level implementations are also given in this chapter.

Chapter 6 represents a burst-mode capable Clock and Data Recovery (CDR) circuit design as a possible functional extension to GBT transceiver. Introduction to the clock and data recovery concept, proposed classification of CDRs and a proof-of-concept test circuit implementation details are presented.

Chatper 7 summarizes the work and provides an outlook for the future work.

**Appendix A** provides the behavioral model for second-order systems relating to all negative feed-back loops in this thesis, namely the operational amplifiers in the CMAD and the CP-PLL itself with practical details. **Appendix B** presents all hardware model cores written either in verilog and/or verilogA. The software tool, named CaPPeLLo<sup>7</sup>, developed for calculating loop parameters, evaluating CP-PLL behavior and jitter performance is also presented briefly. Script cores which were used to calculate results presented in the text are provided.

 $<sup>^7\</sup>mathrm{CaPPeLLo}$  stands for CP-PLL parametrizer, developed in C/C++ programming and Octave scripting languages, in the framework of this thesis.

# Chapter 2

# The COMPASS Experiment at CERN

The COMPASS<sup>1</sup> experiment is a continuation of the EMC<sup>2</sup>, NMC<sup>3</sup> [55], and the SMC<sup>4</sup> experiments at CERN. Starting from 1995, SMC and HERMES<sup>5</sup> collaborations designed experiments for muon physics and hadron spectroscopy, that seemed to be very similar in the foreseen setup. The Hadron Muon Collaboration (HMC) [53] proposed to investigate the spin structure of the nucleon by scattering of muons off a polarized target. The Charm Experiment with Omni-Purpose Setup (CHEOPS) [54] was interested in semi leptonic decays of charmed baryons. Looking at the COM-PASS muon and hadron program the original formulation of physics questions of those programs are maintained.

The main physics objective of the muon beam physics program of COMPASS is the measurement of  $\Delta G/G$ , the gluon polarization in a longitudinally polarized nucleon. The hadronic program comprises a search for glue balls in the high mass region in exclusive diffractive pion proton scattering, a study of leptonic and semi leptonic decays of charmed hadrons with high statistics/precision and Primakoff scattering with various probes.

<sup>&</sup>lt;sup>1</sup>COmon Muon and Proton Apparatus for Structure and Spectroscopy.

<sup>&</sup>lt;sup>2</sup>European Muon Collaboration.

<sup>&</sup>lt;sup>3</sup>New Muon Collaboration.

<sup>&</sup>lt;sup>4</sup>Spin Muon Collaboration.

<sup>&</sup>lt;sup>5</sup>The experiment at the HERA accelerator at the DESY.

Among its physics goals, there are studies of the spin dependent structure function  $g_1$  of the proton and the deuteron, flavor dependent quark polarization in a nucleon and parton distributions in a transversely polarized target. A detailed investigation of charmed and doubly charmed baryons is performed in the second stage of the experiment.

## 2.1 Physics Overview

Neutrons and protons are the basic building blocks of matter. They form the atomic nucleus, thus the name *nucleon*, and are responsible for the major part of the atomic mass. In our current understanding, they are composed of *quarks* bounded by the strong *color force*.

Six different quarks, the flavors, are known. Sorted according to their masses, they are (from the lightest to the heaviest): Up (u), Down (d), Strange (s), Charm (c), Bottom (b) and Top (t). In the naive constituent quark model, nucleons are described as a combination of three constituent quarks. Together they define the properties of the nucleon, like *charge* and *mass*. Different combinations of flavors result in different types of nucleons: protons consists of (*uud*), neutrons of (*udd*).

Besides an electrical charge, the quarks also carry a strong charge, the *color*: quarks can be *red* (r), *blue* (b) or *green* (g), anti-quarks carry the respective *anticolors* anti-red (r), anti-blue (b) or anti-green (g). Since its introduction to particle physics, color has never been found with a free particle: quarks always appear as *white clusters* called *hadrons*. Two ways to get white hadrons have been observed so far: *mesons* are built of one quark and one anti-quark in such a way that their colors neutralize each other, *baryons* are made up of three quarks whose color combination (rgb), in the same way as in the optical phenomenology, also makes white. In this scheme the nucleons are only two hadrons out of many: they are the *three-quarksystems* consisting of u- and d-quarks only and are therefore the lightest baryons.

In the framework of *Quantum Chromodynamics* (QCD) the interaction of the quarks and their color fields is described in analogy to the very successful *Quantum Electrodynamics* (QED) by the exchange of field quanta, the so-called *gluons*. The **December 17, 2007** 

strength of the interaction between quarks and gluons is described in the *coupling* constant,  $\alpha_s$ . But since in QED the field quanta do not carry charge, they can not interact among each other, whereas in QCD gluons are colored and do show self-interaction, which leads to a much more complex interaction scheme compared to that of QED.

It turned out that the interaction of quarks is rather weak when they come very close together, i.e. when their kinetic energies are high. It can then be described assuming a one-gluon-exchange similar to the one-photon-exchange in the QED case. Due to a small coupling constant,  $\alpha_s$ , more-gluon-exchanges are said to be unlikely: at around 100GeV,  $\alpha_s$  was measured to be only of the order of approximately 1/10. It is therefore sufficient for many applications to calculate just the one-gluon-exchange and add more gluons only as small disturbance. This is called *perturbative QCD*.

Surprisingly, in the case of large distances and, accordingly, small quark energies, the interaction gets *stronger*. To allow for more gluons in such interactions, the coupling constants is said to be not constant but assumed to get larger for larger distances. Hence it is called a *running coupling constant*. As a consequence, this makes it impossible to separate two quarks, as the force field acquires so much energy that it finally bursts into a  $q\bar{q}$ -pair under conservation of the white *color*. The fact that quarks are not separable is called *confinement*.

At the nucleon's energy scale of 1 GeV,  $\alpha_s$ , is already more than 0.3. Consequently the behavior of quarks and gluons and therefore the structure of the nucleon can not be described in the same perturbative way of a simple one-gluon-exchange anymore, but many-gluon-exchanges have to be taken into account as well. The method breaks down anyway at  $\alpha_s$  equal to approximately 1 at the latest. In the last few years, much effort has been spent on finding new ways of describing lowenergy QCD. The most promising ones are *Lattice QCD* and *Chiral Perturbation Theory*. In Lattice QCD the field equations are solved exactly on a grid with a finite spacing, using a huge amount of computer power. Chiral Perturbation Theory uses the *chiral symmetry* of QCD at low quark momenta, which is approximately valid since the quark masses can be neglected because they are still much smaller than the quark momenta.

In the nucleon and in all baryons in general, quarks move at distances for which the low-energy models mentioned above are just starting to be applicable. In this region many peculiar features of QCD show up. Thus, inside the nucleons a large number of so-called *sea quarks* were found, which have their origin in gluons fluctuating for short times into quark anti-quark pairs. Furthermore, while in atoms the energy stored in the binding of the electrons to the protons with opposite electrical charge is very small<sup>6</sup>, the binding energy is surprisingly large in the nucleon. Despite the large number of sea quarks, the contribution of quarks to the total momentum of the nucleon was measured to be only around 50%, the rest is contributed by the *binding*, the gluons. These results helped considerably to advance the theoretical understanding of the nucleon: from a vacuum filled with three point-like quarks, one moved to some kind of plum-pudding model where quarks are embedded into a background of gluons like raisins in the pudding. This background, however, transforms constantly back and forth into quarks and anti-quarks as well. The small excess of three valence quarks over the large number of quark anti-quark pairs in the sea defines the type of the nucleon. The constituent quark mentioned at the beginning finally became only an *effective* particle consisting of a valence quark with a large cloud of gluons and sea quarks around.

The *spin* is a very sensitive probe for the forces reigning the nucleon and is another static property of quarks and nucleons. It seems that it plays a more important role in high-energy particle production than expected. Traditionally the spin of particles was considered little interesting as it was thought not to influence the particle production at all. At any rate in high-energy multi-particle production the particle's mass is small compared to its energy, and thus, it was assumed that its spin behavior should be as simple as that of massless particles. Surprisingly, experimental data did not confirm this assumption: more than 25 years ago in the collision of high energetic protons of 300 GeV with beryllium nuclei, scientists found Lambda particles,  $\Lambda^0$  (uds), for which the spin direction was not distributed homogeneously, but mainly perpendicular (i.e. transverse) to the production plane

 $<sup>^6\</sup>mathrm{Compare},$  for example, the electron's binding energy of 13.6 eV with the hydrogen mass of 938890076.4 eV.
spanned by proton and  $\Lambda^0$ , even though neither proton nor beryllium were polarized themselves [7].

Since then such spontaneous polarization has been found with other interactions as well, exhibiting a regular pattern, which should help to uncover the underlying production mechanism. Yet the observed polarization still can not be comprehensively described and is likely to originate from several sources, being related to either the structure of the baryon itself or to the production process. Measuring this quantity therefore gives an important insight into the world of the quarks and how they put themselves together to make up baryons.

While much data on transverse polarization phenomena have already been collected in proton proton, kaon-proton or pion-proton collisions, no data are available from photo-production. COMPASS, a state-of-the-art experiment currently running at CERN, has the unique chance to provide highly precise data on this topic, which will give a new view to this subject and the possibility to check new models.

## 2.1.1 Muon-Beam Physics Program

In the middle of 1970s, the first *deep inelastic scattering* (DIS) experiments with polarized beams started to operate at SLAC [27] [30] to investigate a new degree of freedom, the *spin*. Polarized DIS experiments continued at CERN by the European Muon Collaboration (EMC). The EMC discovered [31] [32] that Ellis-Jaffe summing rule [33] is violated, a fact known as *spin-crisis*. In the simplest approach of the Quark Parton Model (QPM), in the nucleon there are three valance quarks with a spin of 1/2. The spins of two quarks were parallel to the nucleon spin and one quark has its spin anti-parallel. In this way, a nucleon spin equal to 1/2 is recovered. In this simplest approach the quantity which measures a fraction of the nucleon spin carried by the quarks,  $\Delta\Sigma$ , is equal to 1. Taking relativistic effects into account,  $\Delta\Sigma \approx 0.6$  is expected [34]. A value measured by EMC was  $\Delta\Sigma \approx 0.12 \pm 0.09 \pm 0.14$ , far from the expectations. It was a surprise as the QPM successfully described e.g. hadron charges, their anomalous magnetic moments and mass differences between hadrons. Further experiments, i.e. Spin Muon Collaboration (SMC) [35] at CERN, E142 [38], E143 [41], E154 [42], E155 [43] at SLAC and HERMES [45] [50] at DESY December 17, 2007

confirmed the EMC observation with a better accuracy.

In a more realistic approach the spin of the nucleon may be carried by quarks,  $\Delta\Sigma$ , gluons,  $\Delta G$ , and by orbital momentum of quarks and gluons,  $L_q$  and  $L_g$ , respectively as

$$\frac{1}{2} = \frac{1}{2}\Delta\Sigma + \Delta G + L_q + L_g \tag{2.1}$$

where  $\Delta\Sigma$  is already reasonably well known  $(0.30 \pm 0.04 \pm 0.09)$  [51], the other terms are still un-tackled and need to be measured.

The spin contributions of the quarks are experimentally accessible in deep inelastic scattering, where one does not scatter on the complete nucleon, but on partons carrying only a fraction of the nucleon's momentum. In *inclusive* measurements only the scattered lepton is detected, including *all* possible reactions of the target nucleon. From the kinematics of the lepton alone, the contributions of both quarks and anti-quarks to the spin could be extracted. The experiments of the next generation are hoped to find out more about the flavor-decomposed quark contributions or gluon polarization and orbital momenta. For this, more information on the scattered quark is needed. It is expected to hadronize into the hadron with the highest momentum (*leading hadron*), which has to be identified and its kinematics determined in *semi-inclusive* Deep Inelastic Scattering (SIDIS).

 $\Delta G$  can be probed in so-called *Photon-Gluon-Fusion* (PGF), orbital momenta are accessible in *Deeply Virtual Compton Scattering* (DVCS). In recent years an additional, equally important spin distribution function, the so-called *transversity*, was found to be very interesting.

COMPASS will provide precise data on these topics. Both beam and target are polarized. While the spin direction of the  $\mu$  is more or less fixed by the accelerators, the target spin is rotated 3 times a day. To reduce systematic uncertainties, measurements with different target polarizations are compared in such a way that experimental biases cancel each other out. Gluon polarization is given priority here.

#### **Gluon Polarization**

The most promising candidate for a significant contribution to the spin of the nucleon not attributed to the quarks is the gluon. Similar to the missing momentum of the December 17, 2007 nucleon, one hopes to find a good fraction of the nucleon spin with these exchange bosons. Unfortunately the spin of the gluons is not easy to probe: the clearest polarized probes are polarized photons, as e.g. in polarized muon scattering. But as gluons do not carry electric charge, they can not interact with photons directly. One has to use a second order process, in which the photon interacts with the gluon via an intermediate quark line. Such a process is called *photon gluon fusion* (PGF). In order to enrich the data sample with PGF events, two ways are followed in COMPASS, namely *Open Charm Production* and events with *high*  $p_T$ .

#### **Open Charm Production**

In leading order heavy quarks produced via PGF, contributions from the sea quarks or from the fragmentation process of light quarks are small. COMPASS will therefore measure the spin-dependent asymmetry for charm muo-production [74], which is given by the number of charm events for muon spin parallel and anti-parallel to the target spin.

#### High $p_T$ Events

An alternative to the limited cross sections of open charm production is looking for all possible quark types, which manifest themselves in the production of two jets with opposite azimuth [39]. In the case of the moderate energies of fixed-target<sup>7</sup> experiments the jets shrink only to the two leading hadrons, which still reflect the original quark flavor and direction of the hard process. While the statistics situation is much better compared to the charm-only case, the background situation is less favorable.

#### Transversity

In leading order, three independent quark contribution functions are needed for the complete description of the nucleon. Besides the already mentioned q and  $\Delta q$  as

<sup>&</sup>lt;sup>7</sup>One of the two fundamental types of HEP experimental systems, the other one being the so-called collider, having different advantages and disadvantages.

the number density and the polarization of the quarks of flavor q, additionally the transverse spin contribution  $\Delta_{Tq}$  is necessary. It describes the quark distribution in a transversely polarized nucleon with respect to longitudinally polarized beam, in analogy to  $\Delta q$  as the difference between quark spins parallel and anti-parallel to the nucleon spin. In the naive picture  $\Delta_{Tq}$  should be the same as  $\Delta q$ . But this is only valid in the non-relativistic regime, whereas for the quarks in the nucleons, the two distributions are expected to differ from each other. Their discrepancy is a measure for the size of the relativistic effects and therefore it is interesting.

Unfortunately, transversity can not be directly probed in normal DIS, as the necessary process exhibits an inapt symmetry behavior: an involved spin flip changes the symmetry state from chiral-even to chiral-odd. Such processes have to be compensated with another chiral-odd effect, like in a sequence of two such processes (as in Drell Yan Processes) or when they are followed by an equally chiral-odd fragmentation process, the so-called *Collins Fragmentation*. This complication is the reason why transversity has not yet been measured and is now on the list of the physics goals of COMPASS. COMPASS wants to measure the azimuthal distributions of the leading  $\pi$ , which should -according to Collins- show an asymmetric behavior that can be related to  $\Delta_{Tq}$ . Again SIDIS is the key to this topic [47].

## DVCS

As already mentioned, the orbital momentum of the quarks  $L_q$  is a very interesting quantity. This field has opened just recently, when a connection was found between the total angular momentum of the quarks  $J = 0.5\Delta\Sigma + L_q$  and the so-called *Generalized Parton Distribution* functions, GPDs [46]. It has been proposed to use Deeply Virtual Compton Scattering (DVCS) in which the virtual photon emitted from the muon beam is scattered off the target nucleon and becomes real, for the extraction of these GPDs in COMPASS.

The measurement of DVCS is not in the COMPASS proposal of 1996, but was proposed only much later. First feasibility studies and measurements already gave positive results. A measurement of DVCS is intended in the second phase of COM-PASS after 2005.

## 2.1.2 Hadronic Physics Program

Hadronic physics program continues the experimental efforts on investigating the strong interaction inside the nucleon via spectroscopy of hadrons with different quark content (e.g. charmed baryons) and non- $q\bar{q}$ -systems (e.g. glueballs).

The need for very high statistics is common to all projects; it is due to small cross sections for charmed baryons or glue-balls and high mass diffractive systems. Therefore high intensity beams are needed. The various beam energies between 100-300 GeV available at COMPASS permit very clear systematic studies, and different beam particles ( $\pi$ , K and p) open the possibility for studies in different environments with the same setup.

One of the key goals is the investigation of baryons with c-quark content, socalled *charmed* and *doubly-charmed* baryons. Such measurements require a highly optimized layout and are therefore not possible in the initial setup, but only later in phase 2 started in 2006. Many other hadronic topics, however, could be pursued earlier, for example the production of exotic states and the polarizability of the  $\pi$ .

#### **Exotic States**

Exotic state is the name for hadrons which can not be described as  $(q\bar{q})$  or (qqq) systems. Different scenarios are possible. One of the most fascinating features of QCD is the fact that the transmitters of the color force, the gluons, carry color charge themselves and therefore should be able to form bound states among each other, so-called *glue balls*. An intermediate formations are *hybrids*, for which both valence quarks and exotic gluonic degrees of freedom are present. Currently it is expected that even if these objects with gluonic degrees of freedom can be formed, they have a rather short lifetime and will be difficult to find in the presence of background from other hadronic resonances. One can use the fact that objects with gluon content can form quantum states that can not be reached with fermions alone, the so-called *exotic quantum numbers*.

Additionally, it is interesting to look for hadronic structures which contain more than 3 quarks, like Hexaquarks, Pentaquarks and Tetraquarks with different combinations of heavy and light, quarks and anti-quarks. From a theoretical point of December 17, 2007 view these objects are very interesting and several predictions are available. But experimentally their discovery is a very challenging task. COMPASS with its high rate and precise acquisition, will make its contribution in extracting and identifying these objects from the background.

#### **Primakoff Scattering**

The lightest and the simplest quark system is the  $\pi$ . It consists of one quark (u or d) and one anti-quark ( $\overline{u}$  or  $\overline{d}$ ). As free quarks have not been observed yet, the  $\pi$  is an interesting object for hadron physicists for investigating quarks and their interactions inside hadrons. The theoretical basis is quite well understood: the  $\pi$  plays an important role in chiral *perturbation theory* (PT), which has been very successful in the description of strongly interacting systems at low energies in the last few years.

To find out more about the forces keeping the two quarks together, one wants to see the influence of electromagnetic fields acting on the (electrically) charged quarks. As the  $\pi$  has a lifetime of only  $10^{-8}$  seconds, it is not possible to produce a  $\pi$  target. One, therefore, has to consider *inverse kinematics*, where a  $\pi$ -beam is scattered off the electric field of a high-Z nucleus, performing the so-called *inverse Compton* or *Primakoff reaction* and produces a real photon.

#### **Charmed Baryons**

After the formulation of the SU(3) flavor group, which describes the similarities among u, d and s quarks, the scheme was soon extrapolated to additional, yet unobserved quark types. An even better motivation was provided by the GIM mechanism [49], a theory in which the simple phenomenological model of weak interactions involving a single charged vector boson is considered as a quantum field theory. In higher orders of perturbation theory divergences show up, which the authors propose to absorb via the introduction of a 4th fundamental fermion, thereby revealing a suggestive lepton-quark symmetry. Following the rules of group theory, two baryonic (Fig. 2.1) and two mesonic multiplets could be set up, predicting hadrons with c-content. In 1975 the c quark was finally found with the discovery of **December 17, 2007** 



Figure 2.1: SU(4) multiplets of baryons made of u, d, s, and c quarks. The 20-plet with an SU(3) octet (Left). The 20-plet with an SU(3) decuplet (right). Both from PDG2007.

the  $J/\Psi$ , consisting of  $c\overline{c}$ .

It turned out that while u, d and s have a rather similar mass of around 100MeV, the c quark is much heavier (around 1000 MeV). Consequently the symmetric behavior is less good when the c-quarks are included. On the other hand a larger mass results in a smaller binding distance, which can also be treated as an advantage: the c is already heavy enough for first perturbative methods of QCD to be applicable again, here in the framework of *Heavy Quark Effective Theories*, HQET.

While the knowledge about charmed mesons  $(q\bar{q})$  is rather sound, most charmed baryons (qqq) are still experimentally challenging objects: masses, lifetimes and decay widths still have rather large errors. In contrast to the singly charmed baryons, for which all ground states have at least been observed, basically nothing is known about doubly charmed baryons. Just recently the first observation of  $\Xi_{cc}^+$  (ccd) is reported in [48], showing a lifetime significantly shorter than the predictions of 100 and 500 fs. The charmed and doubly charmed baryons opened a new field of investigation [47].

The requirements for the setup are severe. Some optimizations have already been developed (e.g. the so-called *online filter* software package as of 2003, RICH-I detector upgrade and its front-end electronics design, namely the CMAD as of



Figure 2.2: Compass physics interest summary.

2007, reported in this thesis). Doubly charmed baryons were therefore on the list of long-range plans after 2006.

Considering the above discussion which tries to provide some insight to comprehensive goals of COMPASS physics programs, Fig. 2.2 summarizes the physics overview in the form of a mind map.

#### 2.2**Description of the Apparatus**

The setup, seen in Fig. 2.3, consists of two spectrometers, one for small angle and one for large angle particles, giving a wide angular acceptance for all measurements. Each spectrometer performs full particle identification using one Ring Imaging Cherenkov counter (RICH), electromagnetic and hadronic calorimetry and muon detection. A high momentum resolution is obtained by using highly precise tracking with silicon detectors, gaseous strip detectors and drift tubes. The measurements are performed with high intensity beams allowing to collect the needed statistics.

The beam hits a solid state target. The spectrometer is composed of different detectors placed along the beam, which enables the reconstruction of the tracks and the momenta of the interaction products for particle identification. COMPASS setup is approximately 60 m long and divided into two stages positioned behind each other. The first stage is designed to detect particles emitted at low momenta (5-50 GeV) and at large angles and it is therefore called Large Angle Spectrometer (LAS). Detectors with high interaction length (calorimeters and muon wall) have a hole in the middle to allow the high momentum particles to pass undisturbed. These particles are detected in the Small Angle Spectrometer (SAS) that detects particles with momentum from 30 to 100 GeV.

In the subsequent sections the different components comprising the spectrometer are described.



Figure 2.3: COMPASS detector setup [5] in 2006.

December 17, 2007

#### 2.2.1 Polarized Beam

COMPASS is installed in the experimental hall EHN2 of the CERN North Area. The experiment is served by the M2 beam line which can provide both muon and hadron beams. The line can provide also an electron beam which can be used for test purposes.

The extraction line is shown in Fig. 2.4. A 400 GeV/c primary proton beam is extracted from the SPS and is directed towards the primary target T6. The proton intensity on the target varies between  $10^{12}$  and  $10^{13}$  protons per SPS cycle. From the T6 target a secondary beam is derived. In case of the muon beam, the tertiary muons arise from pion and kaon decays. A beryllium absorber stops the hadrons in the beam. In the case a hadron beam is required, the absorbers are removed and the secondary particles are directly transported to the COMPASS target; in this case Cherenkov detectors (CEDAR) are installed in the beam line to perform particle identification. The beam is focused via a set of dipole and quadrupole magnets. The muon polarization is obtained by selecting a certain energy range via the bending magnets. Due to the spill structure of the proton beam to the SPS, the flux of muons is not continuous but concentrated in 4.8 ns (extraction at the SPS) followed by 12 ns when no beam is delivered, for a total of 16.8 ns cycle. The characteristics [24] of the beam for both muon and hadron programs are shown in Table 2.1.



Figure 2.4: The M2 beam line [5].

|                                  | Muon Program     | Hadron Program  |
|----------------------------------|------------------|-----------------|
| Particles                        | $\mu^+$          | $\pi,k,p$       |
| Energy $(GeV/c)$                 | 60 - 160         | 100-300         |
| Intensity (particle/spill)       | $2 \cdot 10^{8}$ | 10 <sup>8</sup> |
| Beam size on targets (RMS in cm) | 0.8              | 0.3 - 0.5       |

Table 2.1: Characteristics of the beam for the muon and the hadron programs.

## 2.2.2 The Spectrometer

Due to time and manpower constraints, COMPASS has been upgraded gradually as the physics program advances in time (e.g. RICH-II was not in the initial setup and RICH-I used initially was upgraded for the second phase of the experiment in 2007). Following sections will provide a brief overview of experimental components which are not necessarily in an explicit form.

#### Targets

COMPASS uses both muon and hadron beams that can address different physical problems. In order to pass from one program to the other, the spectrometer setup must be slightly modified. The most important difference lies in the target.

The target for the muon program is made of two cylindrical rods of  ${}^{6}LiD$  of 1.5 cm radius and 60 cm length separated by 10 cm. The two cells are polarized via Dynamic Nuclear Polarization (DNP) at a temperature of 0.5 K and in a magnetic field of 2.5 T. According to the physical problem to be investigated, the cells can be longitudinally or transversely polarized with respect to the beam direction. The two cells have opposite directions of polarization with respect to each other to avoid systematic errors in the offline reconstruction; for the same reason the polarization of the cells is inverted every 8 hours. Because of technical problems, the COMPASS solenoid magnet has not been completed in time. This fact forced the SMC target magnet, which has a lower acceptance with respect to the COMPASS magnet ( $\pm$ 70 mrad instead of  $\pm$ 160 mrad), to be reactivated. A polarization of 55% was achieved in the 2003 data run.

A different target will be used in the hadron program: the Primakoff and charm **December 17, 2007** 

programs require a thin (2-3 mm) solid high-Z target. A precision vertex reconstruction is obtained with 3 or more silicon stations installed downstream. For the Primakoff program an additional veto box (a barrel of scintillators placed around the target) allows undesired events, where hadronic fragments are produced, to be vetoed. For the diffractive and central production programs a liquid hydrogen target is used. A recoil detector made of layers of scintillators will be installed around the target. The detector is needed to identify the recoil proton.

#### Magnets

COMPASS uses conventional dipole magnets to reconstruct the particle momenta. Tracking detectors placed up and downstream of the magnets, permit the reconstruction of the deflected tracks. By knowing the properties of the magnetic field it is possible to extract the momentum of the particles. The first magnet SM1 has a central gap of  $110x153x172 \text{ cm}^3$ ; for the hadron program the height of the gap will be reduced from 172 to 82 cm. SM1 has a integrated field on 1 Tm at 2500 A. The second magnet SM2 has a gap of  $400x200x100 \text{ cm}^3$  and a maximum integrated field of 5.2 Tm. During the 2003 run it was operated at 4.4 Tm at 4000 A.

#### Tracking

COMPASS uses different tracking detectors along the entire spectrometer. They can be divided into three classes depending on their sizes: VSAT (Very Small Area Tracker); SAT (Small Area Tracker) and LAT (Large Area Tracker).

VSAT: For the region upstream of the target and for the area in proximity to the scattered beam, where the particle density is high, detectors with high spatial resolution and small size are used. There are two different types of Scintillating Fibers Stations (SCIFI-J and SCIFI-G), that additionally have excellent time resolution of 400 ps and are used to assign the correct time to the event. The silicon detectors (SI) are used in the muon setup only upstream of the target for beam reconstruction. Instead, for the hadron setup, more detectors are foreseen downstream of the interaction point to improve the vertex reconstruction.

SAT: The SATs have a larger active area compared to the VSAT. The SATs are December 17, 2007

Micro-Megas (MM) [28] and GEM [29] detectors: they are both gaseous detectors with innovative systems for the charge amplification stage (a metallic micro-mesh and a perforated copper-clad polymer foil, respectively). The central part of the detectors, where the beam passes through, is usually dis-activated to avoid discharges in the gas due to the high intensity.

LAT: The outermost area with respect to the beam direction, where the intensity is low and high resolution is not required, is covered by the LAT. Conventional Drift Chambers (DC and W45), Multi Wire Proportional Chambers (MWPC) which are replaced by fast Photo Multiplier Tubes (PMT) for the second phase of the experiment in 2007 and Straw chambers are used.

Usually three detectors, one per type, are mounted close to each other, centered along the beam direction. This nested configuration is particularly efficient, since a large area is covered to maximize the tracking efficiency whereas different spatial resolutions in regions with different intensity minimizes the occupancy.

#### Particle Identification

In order to distinguish between pions, protons and kaons, COMPASS uses RICH detectors. A RICH detector measures the velocity of particles via their Cherenkov emission angle at their passage through the radiator material. Its purpose is to separate  $\mu$ , p and K with momenta up to 120 GeV/c. The photons are detected via PMTs with segmented fast photo-cathodes.

The energies of all particles, except the muons and neutrinos, are measured by the calorimeters, where they are absorbed and deposit all their energy. Due to the high density of the material in the calorimeter, the particle creates a shower that allows to reconstruct the position of the incident particle. Calorimeters are also the only detectors in COMPASS which are sensitive to neutral particles. Each stage is equipped with electronic and hadronic calorimeters, installed downstream of the RICH. The electronic calorimeters ECAL-1 and ECAL-2 are made out of lead glass blocks from the former experiment GAMS. The hadronic calorimeters HCAL-1 and HCAL-2 have a similar structure, consisting of sandwiches of scintillators and iron plates. The information from HCAL-1 and HCAL-2 are also used in the formation **December 17, 2007**  of the trigger. The high penetration capability of high energy muons is used to identify them in the muon wall detectors MW1 and MW2. A particle is identified as a muon if detected in both layers of tracking detectors upstream and downstream an iron block of 1 m.

#### Trigger

The trigger initiates the data acquisition. A trigger is identified via the geometrical properties of the scattering muon track and of the energy deposited by the produced hadrons in HCAL-1 and HCAL-2. The muon track is reconstructed with dedicated scintillator hodoscopes placed all along the experiment. A different trigger



Detector Frontends

Figure 2.5: Simplified COMPASS DAQ architecture.

calibration allows quasi real photon events  $(Q^2 < 1 GeV^2)$  and inclusive deep inelastic scattering events  $(Q^2 > 1 GeV^2)$  to be distinguished. For the hadron program, additional information from the electronic calorimeters will be used in the trigger.

In the DAQ architecture, seen in Fig. 2.5, the data from the detectors are first collected from the detector front-ends (e.g. CMAD ASIC in case of RICH-I of the second phase), then transmitted to the DAQ computers via data transmission ASICs, where they are combined into an event block and transferred to the central data recording.

### 2.2.3 **RICH** Detection Principle

One of the key components of the experimental apparatus is a ring imaging Cherenkov (RICH) detector, used to perform particle identification by measurement of their velocity. From a known momentum of a detected particle, measured in another detector, it is possible to extract information about the mass of the particle. It is important to understand that the RICH sub-detector is a part of the complex system and can not perform particle identification without the information provided by the tracking sub-detectors.

The measurement of a charged particle velocity is based on Cherenkov effect: a charged particle traveling in a dielectric medium (with an index of n) faster than the speed of light in that medium, i.e.  $v_p > c/n$ , causes polarization of the medium atoms. Polarized atoms, then emit Cherenkov radiation, which creates conical wave fronts with vertex angle  $\theta$  because of interference, see Fig. 2.6.

A common analogy is the sonic boom of a supersonic aircraft. The sound waves generated by the supersonic body do not move fast enough to get out of the way of the body itself. Therefore, the waves stack up and form a shock front. Similarly, a speed boat generates a large bow shock because it travels faster than waves can move on the surface of the water. In the same way, a charged particle generates a photonic shock-wave as it travels through the insulator faster than the speed of light in that medium [40]. The angle  $\theta$  is given by

$$\cos(\theta) = \frac{c}{V_p n(\lambda)} \tag{2.2}$$

where  $V_p$  is the velocity of the particle and n is the dielectric constant of the medium which is a function of the wave length,  $\lambda$ , of the emitted radiation. The angle,  $\theta$ , can have values between 0 and at maximum  $\arccos(1/n)$ . The intensity and spectrum of the radiation is given by the *Frank-Tamm* relation

$$\frac{dN_{ph}}{dE} = \frac{\alpha}{\hbar c} z^2 L \sin^2(\theta) \tag{2.3}$$

where the number of emitted photons is a function of the charge z of the particle, Cherenkov radiator length L and the angle  $\theta$  as given by Eq. 2.2 is given as:

$$N = N_0 z^2 L \sin^2(\theta) \tag{2.4}$$

 $N_0$  is proportional to  $1/\lambda^2$  and is the detector response parameter. Its dependence on wavelength of the emitted radiation  $\lambda$  is shown in Fig. 2.7. Number of emitted photons increases with their energy.  $N_0$  is expressed in following formula

$$N_0 = \left(\frac{\alpha}{\hbar c}\right) \varepsilon \Delta E \tag{2.5}$$

where:

$$\varepsilon \Delta E = \int (QRT) dE \tag{2.6}$$

Here  $\varepsilon$  is the energy average of detector efficiencies (Q is quantum, T is transmission and R is mirror reflectivity) over the energy bandwidth  $\Delta E$ . As the charged particle passes the radiator medium, Cherenkov conical wave-fronts are emitted along the whole trajectory of the particle inside the radiator.

In the most cases, the Cherenkov ring-image is formed from conical wave-fronts in focal plane of a focusing mirror (see Fig. 2.8) and photons are detected by high sensitive photo-detectors. In the case of a spherical mirror, the radius of the Cherenkov ring-image  $r_{c,im}$  is given by

$$r_{c,im} = \frac{Rtan(\theta)}{2} \tag{2.7}$$

where R is the radius of curvature of the mirror.

Particle identification is limited by the resolution of the Cherenkov angle measurement. The minimum difference in Cherenkov angle  $\Delta \theta_{m1,m2}$  necessary for separation of two particles with masses  $m_1$  and  $m_2$  and at momentum p with number December 17, 2007



Figure 2.6: Huygens principle applied on wave-fronts emitted due to a charged particle traveling at low velocity (left), faster than light (right). From Eq. 2.2.

of sigmas  $n_{\sigma}$  is given by:

$$\Delta \theta_{m1,m2} = \frac{(m_2^2 - m_1^2)}{2n_\sigma} \frac{\sqrt{N_0 L}}{p^2}$$
(2.8)

The required resolution  $\Delta\beta/\beta$  is then:

$$\left(\frac{\Delta\beta}{\beta}\right)_{m_1,m_2} = \frac{m_2^2 - m_1^2}{2p^2} \tag{2.9}$$

For a given Cherenkov angle  $\theta$ , a single photon resolution  $\sigma_{\theta}$  and a number of detected photons N, two particles can be separated with  $n_{\sigma}$  sigmas if their momenta are:

$$P_{m_1,m_2} \le \frac{1}{\sqrt{n_\sigma}} \sqrt{\frac{(m_2^2 - m_1^2)\sqrt{N}}{2tan(\theta)\sigma_\theta}}$$
(2.10)



Figure 2.7: Number of Cherenkov photons as a function of their wavelengths [18].



Figure 2.8: RICH detection principle (left) and an on-line data quality monitoring tool [19] [44] displaying an actual Cherenkov ring acquired within a test beam at CERN (right).

Page intentionally left blank.

# Chapter 3

# Design of CMAD

In nuclear and HEP experiments, physical interface between detector and read-out system is established by the front-end chip which is usually an ASIC. FE chips are the first in *see*ing the electronic outcome of the actual detection. They perform the first processing of the electric signal generated by the particle.

This chapter starts with presenting the architecture of the current ASIC, the MAD-4, and associated limitations which motivated its replacement. Afterwards the architecture of a new FE ASIC, the CMAD, developed for the RICH-I detector system of COMPASS experiment is described. The first order calculations, the high level behavioral model, the device level implementation and relevant simulation results will be presented.

## 3.1 Architecture of the MAD-4

MAD-4 has been developed to meet the front-end electronic needs of  $CMS^1$  barrel muon chambers [58]. It was adopted for reading out the existing RICH-I detector system of COMPASS experiment.

The task of the ASIC was to amplify signals picked up by chamber wires in CMS and by  $MWPCs^2$  in COMPASS, compare them against an *external* threshold and transmit the results to the acquisition electronics.

<sup>&</sup>lt;sup>1</sup>CMS (Compact Muon Solenoid) is one of the main experiments on LHC.

<sup>&</sup>lt;sup>2</sup>MWPC stands for Multi-Wire Proportional Chamber.



Figure 3.1: Architecture of the MAD-4.

The chip, built using  $0.8\mu m$  BiCMOS technology, provides 4 identical chains of amplification, discrimination and cable driving circuitry. It integrates a flexible channel enabling/disabling feature and a temperature probe for monitoring purposes.

The working conditions of the detector set requirements for high sensitivity and speed combined with low noise and little power consumption. As a fundamental requirement for a front-end, as low threshold value as possible should be set to improve efficiency and time resolution. A good *uniformity of amplification* between channels of different chips and very *low offset* for the whole chain are also needed.

Fig. 3.1 shows the full chip architecture. Four identical analog chains are made of a charge preamplifier followed by a simple shaper with baseline restorer, whose output is compared against an external threshold by a latched discriminator; the output pulses are then stretched by a programmable one-shot and sent to an output December 17, 2007 stage able to drive long twisted pair cables with LVDS compatible levels. Control and monitoring features have been included in the chip: to mask noisy wires, each channel can be disabled at the shaper input resulting in little crosstalk to neighbors. An absolute temperature probe has been integrated in order to detect electronics failures and monitor environmental changes.

Two separate power supplies (5V and 2.5V) are used in order to reduce power drain and minimize interference between input and output sections. The layout and routing are particularly taken care and many pins have been reserved for power, input ground and analog ground.

To prevent latch-up events and improve crosstalk performance, guard ring structures are largely used to isolate sensitive stages like the charge preamplifier or complementary MOS devices.

## 3.1.1 Motivations for the Upgrade

MAD-4, adopted for the read-out of RICH-I sub-detector system of COMPASS experiment, has been used successfully for years. However, considering the upgrade of RICH-I [57] for the second-phase of the experiment, it has the following limitations:

- Since it has been adopted from another system, particularly the front-end stage is not optimized for RICH-I of COMPASS leading to relatively a high level of input-referred noise. This prevents setting thresholds lower than a certain limit, degrading the performance especially in case of the new system.
- 2. The threshold is externally and globally set within a chip, thus all the channels on the die are set to the same threshold value. This leads to the need for an external circuitry to generate the threshold and a degraded channel-equalization functionality because channel thresholds are not independently adjusted.
- 3. The processing speed of a single channel can not sustain 5 MHz rate required by the new setup.

The above limitations mostly originate from the state of the technologies used at the time of the system development and they motivate the design of a new front-end ASIC as detailed in this chapter.



Figure 3.2: Binary read-out architecture of a single channel in the CMAD.

## 3.2 Design of the CMAD

In order to improve the reconstruction efficiency of the RICH-I, an upgrade is under development [57]. The experience gained with the first physics runs has shown that a trigger rate of 100 kHz and a single channel rate of 5 MHz should be sustained in order to reach optimal performance. These tight requirements can be achieved by detecting the photons produced in the sensitive volume by photomultiplier tubes equipped with fast read-out electronics. The granularity of the system demands the use of compact multi-anode photo-multiplier tubes (MPTs). The increased event rate that the system has to cope with is one of the main reasons which motivated the development of a new front-end ASIC, the CMAD, presented in this section. Fabricated in a commercially available 0.35  $\mu m$  CMOS technology, the chip performs binary read-out of the MPT signals.

#### 3.2.1 Architecture

Fig. 3.2 shows the architecture of a single channel. Each processing channel features a low-noise trans-impedance amplifier followed by a shaper with 10 ns peaking time, a baseline holder (BLH), a comparator, a programmable one-shot to maintain the backward compatibility with the existing read-out system [59] and an LVDS driver.

The gain of each channel, which was fixed to 4mV/fC in MAD-4, is now programmable. Two modes are available. In the *low gain mode*, the gain can be adjusted from 0.4 mV/fC to 1.2 mV/fC with an average step of 0.1mV/fC. In the *high gain mode*, the gain can be programmed from 1.6mV/fC to 4.8mV/fC. December 17, 2007



Figure 3.3: Charge sensitive amplifier, CSA.

This tunability allows to compensate at least partially for the channel-to-channel gain variation of the MPTs. Additionally, the threshold of each comparator can be adjusted on a channel by channel basis via a local 10-bit digital-to-analog converter (D/A). The gain of the front-end and the D/A codes are programmed using a digital control unit and the I2C standard.

The Charge Sensitive Amplifier (CSA) used as the first element of the chain is shown in Fig. 3.3. It consists of a cascode amplifier with a capacitive feedback,  $C_f$ , a voltage buffer, B, and a resistive feedback,  $R_f$ , as resetting device. The voltage buffer, B, is placed so to overcome the problem of open loop low frequency voltage gain drop due to the loading effect of  $R_f$ . The voltage buffer also allows for avoiding a direct coupling between  $C_f$  and a possible input capacitance of the following stages.

The fast shaper shown in Fig. 3.4 is based on a class AB<sup>3</sup> operational amplifier [84] around which two feedback networks are implemented. A fast path (shaper) performs high frequency filtering while a slow baseline holder (BLH) feedback provides the AC coupling with the previous stage and guarantees baseline stabilization [68].

A fast unity gain buffer with limited slew rate is used in the baseline control loop. Fast signals at the output of the shaper are clipped before arriving at the trans-conductor stage denoted as  $G_m$ . The baseline stabilization circuit or baseline

<sup>&</sup>lt;sup>3</sup>Amplifier circuits are classified as A, B, AB and C for analog designs, and class D and E for switching designs. For the analog classes, each class defines what proportion of the input signal cycle is used to switch on the amplifying device



Figure 3.4: Architecture of the shaper with BLH.

holder (BLH) is designed to reduce the baseline shift to less than 3 mV for output pulses with a 3 V amplitude at 10 MHz rate.

As seen in Fig. 3.2, the first stage (CSA) drives the second one (shaper) with a current signal through an adjustable resistive connection. The value of this resistor can be programmed in order to allow for an additional tuning of the gain by a factor of four. According to the analytical model in [40] developed for the first two stages of the channel, ignoring the feedback resistor  $R_f$  and voltage buffer B, the input integrator has a transfer function of the form

$$\frac{V_{out}}{I_{in}}(s) = \frac{A}{s(C_d + C_f) + sAC_f}$$
(3.1)

where A is the gain of the preamplifier used in CSA. Following a reasonable assumption of  $A >> (C_d + C_f)/C_f$  yields

$$V_{out}(s) = \frac{1}{C_f} \frac{I_{in}(s)}{s}$$
(3.2)

where  $I_{in}$  is the current generated by the ideal current source within the detector model representing a particle passage, therefore the inverse Laplace transform gives the approximate time domain output of CSA as

$$V_{out}(t) = \frac{Q(t)}{C_f} \tag{3.3}$$

confirming that the stage is an integrator. A similar simplification can be followed to achieve the transfer function of the shaper with BLH. The  $G_m$  block has a frequency December 17, 2007 dependent transfer function f(s). Ignoring the slew rate limited buffer and reverse transfer functions, the forward transfer functions of the second stage are

$$T_{low}(s) = \frac{1}{f(s)} \qquad and \qquad T_{high}(s) = T_{sh}(s) \tag{3.4}$$

where  $T_{low}(s)$  and  $T_{high}(s)$  are the transfer functions of the second stage at low and high frequencies and  $T_{sh}(s)$  is the transfer function of the shaper without BLH feedback. f(s) is low pass, so the first transfer function in Eq. 3.4 is a high pass filter.  $G_m$  stage has a narrow bandwidth such that the fast input signals can pass through the whole block without being affected. However, as a consequence of highpass filtering behavior of the  $G_m$  stage, a base line shift has to be expected when the input rate increases. For that reason, a slew rate limited non-linear buffer (SRLB) is inserted before the  $G_m$  stage. This block dynamically clips the pulses to be processed by  $G_m$  block and reduces the area of large and fast signals significantly, preventing low frequency base line fluctuations. The transfer function of the front-end circuit (CSA+Shaper) is achieved as

$$T(s) = \frac{T_{CSA}(s)T_{SH}(s)}{R_{1or2}}$$
(3.5)

on which a detailed analysis and behavioral simulation results are presented in [40].

The peaking time at the output of the shaper is 10 ns. The system is designed to cope with a rate in excess of 5 MHz/Ch. After the comparator, the output pulses are stretched by a programmable one-shot to maintain backward compatibility with the existing read-out system and sent to an output stage capable of driving long twisted pair cables with LVDS compatible levels (Fig. 3.2).

An important issue associated to the operation of the shaper with the BLH relates to the selection of its reference level, namely  $V_{ref_OTA}$ . The DC input of the shaper, denoted as  $in_s$  in Fig. 3.4, is nominally maintained at a certain level via adjusting its reference,  $V_{ref_OTA}$ , while the DC output of the preamplifier is equal to the  $V_{gs}$  of the input device, e.g. around 0.6 V. If the two voltages across the resistor R (driving the signal into the second stage in Fig. 3.4) differ, there would be a current flow accordingly. This additional current must be compensated by the  $G_m$  block. The fact that the  $G_m$  stage can only sink current but can not inject it, imposes a limitation on the value of  $V_{ref_OTA}$ .

 $G_m$  stage should always experience the condition in which it is supposed to sink a compensation current. Therefore in case  $V_{ref_OTA}$ , or equivalently the DC level of the shaper input (denoted as  $in_s$  in Fig. 3.4) is set too high, then the  $G_m$  stage could need to inject current to the input of the shaper, for which it is not designed. This would cause the  $G_m$  block to shutdown, leading to breaking of the AC-coupling and establishing a DC-coupling between the CSA and the shaper. The practical effect is the removal of the BLH from the channel, which means signal processing failure. Additionally if  $V_{ref_OTA}$  is set too low, this time the  $G_m$  block can continue operating but it would need to sink relatively a large amount of current, resulting in a noisy operation. The consideration suggests that, for the reference voltage  $V_{ref_OTA}$ , there is an upper limit above which the channel noise is un-acceptable.

Concluding the above discussion,  $V_{ref_OTA}$  should be set properly in between the two limits accordingly to provide the  $G_m$  function enough margin. In CMAD the references  $V_{ref_OTA}$  and  $V_{ref_BLR}$ , together with the threshold of the comparator are controlled independently to provide the flexibility guaranteeing proper functioning of the channel.

To arrive at the point of the above discussion more quantitatively, let us consider the two possible conditions where  $V_{ref_{o}OTA}$  is set too low and too high. Ignoring SRLB for simplicity, the  $G_m$  block can be thought of an opamp controlling the gate of an nMOS transistor, T1, as depicted in Fig. 3.5. Functional components are enclosed with red dashed-squares and the important nodes and currents are shown in blue.

Let us assume the reference is set to a certain voltage,  $V_{ref\_OTA} = V_{ota}$ . Ignoring the offset of the shaper core amplifier, the node c will have the same value of  $V_{ota}$ , leading to a potential difference of  $V_R = V_b - V_{ota}$  across the resistor, R. A current,  $I_r = V_R/R$ , would flow through R. Input of the shaper (c) is the gate of the input transistor of its first stage. Therefore  $I_r$  can flow either fully through  $R_{sh}$ , thus  $I_r = I_{sh}$  raising the node d up or fully through the transistor, T1, or it can use both of the paths as in the nominal condition. The current which flows through T1 is treated as the compensation current provided by the  $G_m$  block. It should be noted **December 17, 2007** 



Figure 3.5: Example configuration for the FE.

that there is no DC current path back into the CSA and this is the reason the DC level of the CSA input is said to be equal to the  $V_{gs}$  of the input transistor.

For the exercise, it will be assumed that the  $V_{ref\_BLR}$ , thus the node  $d(V_{out\_SHP})$ , is set to 2.7 V. This is the requested baseline while the DC level of CSA input is at  $V_a = V_b = 0.6 \text{ V}(V_{in})$ . The nominal  $V_{dd}$  of the technology is 3.3 V. In the exercise, we will first assume a too low voltage level for  $V_{ref\_OTA}$ , thus the node c, and the opposite case where  $V_{ref\_OTA}$  will be chosen to be too high. Then for these two conditions, the behavior of the circuit of Fig. 3.5 will be examined.

Firstly assuming a  $V_{ota}$  lower than  $V_b = V_a = 0.6 V$ , e.g. 0.3 V, the current denoted as  $I_r = \frac{0.6-0.3}{1000} = 300 \mu A$  flows into the shaper. For the node d to stay at 2.7 V, the current which should flow through  $R_{sh}$  is  $I_{sh} = \frac{2.7-0.6}{20000} = 105 \mu A$ . To keep  $V_d$  at 2.7, the additional current of  $I_{gm} = 300 - 105 = 195 \mu A$  must be compensated by the transistor, T1. If  $I_{gm}$  is this big, assuming T1 is able to sink this amount of current, then the operation is successful but is expected to be noisy. Therefore the potential difference of  $V_b - V_c$  should not be big.

On the other hand, assuming a  $V_{ota}$  higher than  $V_b = V_a = 0.6 V$ , e.g. 0.8 V, the current denoted as  $I_r = \frac{0.6-0.8}{1000} = -200 \mu A$  flows back from the shaper, through  $R_{sh}$ . This would pull the node d up. In this case, T1 shuts down since there is no additional current it can sink to regulate the baseline. Moreover it can not inject December 17, 2007 current. As a result,  $V_d = 0.6 + 200 \cdot 20000 = 4.6V$  is expected, thus the output of the shaper will saturate at 3.3 V as being the  $V_{dd}$  of the circuit. This is interpreted as the removal of the BLH feed-back, thus the baseline at the output of the shaper is not anymore regulated.

## 3.2.2 Transistor Level Implementation

The function of the channel architecture thus far presented could also be achieved by establishing the ac-coupling in a lumped capacitive manner as seen in Fig. 3.6. The capacitor C ac-couples the two stages obviating the need for SRLB. However, this capacitor together with the parasitic impedance at the input of the shaper, denoted as Z, would form a C - R filter. In this case the *unipolar* pulses at the output stage of the CSA would be converted to *bipolar* pulses whose tail lengths depend on the parasitic time constant at the input node of the shaper. If the channel input frequency is low enough, then this simple lumped coupling would perform well. However considering the required processing speed of more than 5MHz/Ch, pulses begin to pile-up over the tail of each other leading to baseline shifts. This lumped ac-coupling approach requires also a bigger die area, taking the behavioral model [40] and the typical values of C into account. Therefore the CMAD employs a more involved active ac-coupling approach which turns out to be better for high integration.

This subsection will briefly present the transistor level implementation details of the building blocks the CMAD employs.



Figure 3.6: Lumped ac-coupling between the first two stages.

#### **Charge Sensitive Amplifier**

The first stage of the channel is seen in Fig. 3.7. The CSA starts at a cascode amplifier (M1, M2) with a p-channel MOS cascode current-mirror load (M3, M4). This amplifier, together with the voltage buffer M9, can be thought of as the basic amplifier involved in the feedback loop whose feedback network consists of the resistor Rf and the capacitor Cf.

The cascode configuration is made up of a common-source amplifier stage followed by a common-gate stage and is widely used because it increases the amplifier output resistance and improves its frequency behavior with respect to a singletransistor amplifier stage. Cascode connection allows us to achieve, from a smallsignal standpoint, a very high load resistance. This is represented by the output resistance of the current mirror in parallel with the output resistance of the block consisting of M1 and M2. Hence it is apparent that M2 performs the function of increasing the output resistance of this block with the only purpose of avoiding the state that it becomes significantly lower than that of the current mirror. In fact the parallel combination of two resistors results in a resistor whose value is always lower than the smaller value of the two resistors. Furthermore, cascading M1 provides only a small decrease in the trans-conductance of the basic amplifier as shown in [40].

An additive biasing current branch consisting of the p-channel MOS cascode current-mirror (M6, M7, M8) causes an additional bias current of IREF2 to flow through M1 only, to increase its small-signal trans-conductance<sup>4</sup> without decreasing the output resistances of M2, M3 and M4<sup>5</sup>. Such a decrease would occur if one simply increased IREF1.

The transistor M6, which acts as cascode for M7, is added in order to increase the output resistance of the simple current mirror (M7, M8) such that this resistance does not decrease the output resistance of M1 with which it is in parallel.

Finally, the transistor M9 together with its active load M10 is used in the common-drain configuration and acts as a voltage buffer, B in Fig. 3.3.

<sup>&</sup>lt;sup>4</sup>which is directly proportional to its bias current since M1 operates in weak inversion

<sup>&</sup>lt;sup>5</sup>which are, on the other hand, inversely proportional to the bias current



Figure 3.7: Transistor level implementation of the first stage, CSA.

December 17, 2007

#### Slew Rate Limited Non-Linear Buffer

At an output stage, the maximum rate at which the load capacitance,  $C_L$ , can be charged by a current, I, is called the *slew rate*, SR, and is given by:

$$SR = \frac{I}{C_L} = \frac{dV}{dt} \tag{3.6}$$

Fig. 3.8 shows the transistor level implementation of the circuit. An nMOS source-coupled pair with a pMOS current-mirror load implements the differential input stage of the buffer whereas a common-drain configuration (source follower) consisting of an nMOS with an nMOS current-mirror load realizes the single-ended output stage. The capacitive load  $C_L$  is connected between the source of M7 and ground. The source follower is used to drive this capacitive load due to its intrinsic feature of providing the asymmetrical slew-rate limitation analyzed in [40]. On the other hand the pMOS current mirror consisting of transistors M8, M9 performs the function of limiting the current that M7 would be able to pull down for charging the output capacitor. Unity-gain negative feedback is provided by connecting the inverting terminal of the input differential pair to the output of the circuit.

The source follower consisting of a pMOS with a pMOS current-mirror load turns out to be particularly useful to overcome both the problem of the *parasitic feed-through* and the problem of the maximum output swing. Concerning the output swing, the dc output voltage of the circuit can even be set to 3V since the magnitude of the gate-source voltage of M13 is approximately equal to the gate-source voltage of M7 whereas the sign is opposite so that the source voltage of M4 is nearly equal to the output voltage. As a result, the transistor M4 can still be biased in the active region even if the dc output voltage is at 3 V. The source follower is able to reduce the capacitive feed-through of the input signal avoiding the direct coupling between the output capacitor  $C_L$  and the intrinsic gate-source capacitances of the input pair. Moreover the feed-through of the input signal now passes through the source of M13 whose bias current can be freely set because it does not take part in the slew-rate limitation process. Hence one can increase the bias current of M13 in order to reduce its output resistance  $(r_0 = 1/g_m)$  to provide a low-resistance path connecting the output node to ground.



Figure 3.8: Transistor level implementation of the slew rate limited buffer, SRLB.

December 17, 2007

#### Shaper Core Amplifier

The requirements for the shaper depends on the output characteristics of the preamplifier stage as follows. Preamplifier can have a characteristic output signal reaching to its maximum value within a period of the order of 10ns. Moreover the shaper has also an amplification function and must deliver high levels of signals. The large slew rate at the output of the preamplifier which must also be followed by the shaper not to affect the channel operation due to the so-called *slew rate limitation problem*, a high output driving capability is required at the shaper stage.

In the shaper low noise, thus low quiescent current is also needed. This enables the next stage, the comparator, to set lower thresholds, leading to an improved front-end performance, therefore a high overall channel efficiency.

When it is desired to have a higher output current and still to have a low quiescent current, a current efficient class AB biasing at the output stage must be incorporated [84]. Fig. 3.9 shows the simplified architecture employed. In this simplified architecture, the input differential pair steers the current from one branch to the other, in order to change the currents flowing through diode-connected transistors, since the current flowing through the *series of batteries* is constant. This is to bias the output transistors in class AB. What is represented as a floating series of batteries tries to maintain the potential difference between gates of the output transistors and can be implemented as a mesh of two head-to-toe connected pair of n- and p-MOS devices [84] as seen in Fig. 3.9.



Figure 3.9: Class AB biased OTA.

Fig. 3.10 shows the transistor level implementation of the class AB OTA used as the core amplifier of the shaper, adopted from [62]. The output stage formed by M11 and M12, is feed-forward biased in class AB by a mesh of head-to-tail connected transistors, M13 and M14 which are enclosed by the red dashed square in the Fig. 3.10. The output stage is a push-pull configuration. At first sight, the bias connections to the sources of M13 and M14 seem to lower the impedance at the gates of the output transistors, M11 and M12. However, the drain connections of M13 and M14 cancel the low source impedances by a positive feed-back loop for current mirror driving voltages. They do not provide additional currents but just switch the path through which the current flows, biasing the output transistors in class AB.

The mesh is incorporated into the folded cascode with M21 through M24. This has the important advantage that no additional bias currents have to be used for class AB biasing. The offset and the noise of these extra bias currents would otherwise have been added to the offset and the noise of the input stage.

To further reduce the offset and the noise contribution of the folded cascode stage, the mirror connection is placed at the upper side instead of at the lower side. This way, the currents in the folded cascodes are reduced by a factor of two with respect to a general application where the mirror connection is usually placed at the bottom side, as detailed in [84]. The specifications of the implemented shaper of Fig. 3.10 is given in Table 3.1.

| Table 3.1: Snaper performance |                              |  |
|-------------------------------|------------------------------|--|
| Peaking time                  | $10\text{-}20~\mathrm{ns}$   |  |
| Linear output swing           | 2.9 V                        |  |
| Nonlinearity                  | $1.5 \ \%$                   |  |
| Slew rate @10 pF load         | $500 \text{ V}/\mu \text{s}$ |  |
| Noise @10 pF capacitance      | $1450 \ e^-$                 |  |
| Power consumption             | $3.3\mathrm{mW}$             |  |

Table 21. Ch c


Figure 3.10: Transistor level implementation of shaper core amplifier.

December 17, 2007

#### **Digital-to-Analog Converter**

Up to here on the processing chain, the input charge which is generated by the detector component is integrated and shaped while its baseline is preserved, therefore the signal is ready to be compared against a locally generated threshold for a binary decision to be made by the comparator stage. The reference for the comparator is generated by a local 10-bits  $D/A^6$ .

In the binary architecture given in Fig. 3.2, a global D/A can not be used since each of the read-out channels needs its own comparator that operates independently from the rest. This brings the necessity of a low power and small area D/A architecture, since it would be used for each read-out channel and thus more than once per chip.

Conceptually, the simplest D/As use a binary-weighted architecture, where nbinary weighted elements (current sources, resistors or capacitors) are combined to provide an analog output (n = D/A resolution). Digital encoding circuits are minimized, but the difference between the MSB and the LSB<sup>7</sup> weights increase with increasing resolution, making accurate element matching difficult.

Among others like Kelvin divider or segmented architectures, the R-2R, or ladder, architecture relaxes component-matching requirements since only two component values are required in a 2:1 ratio. The R-2R architecture can be configured as a voltage- or current-mode D/A, together with different advantages and disadvantages.

A drawback of a current-mode R-2R architecture is the inversion introduced by the opamp which usually exists as an output current-to-voltage converter. Another disadvantage is the complicated stabilization of the opamp due to the fact that the D/A output impedance varies with digital input code. Current mode operation also results in higher glitch, since the switches connect directly to the output.

Advantage of voltage-mode R-2R configuration is that the output has constant impedance, thus simplifying amplifier stabilization. Glitch generated by switch capacitance is also minimized. The drawback of voltage-mode R-2R configuration

<sup>&</sup>lt;sup>6</sup>Another D/A which is identical to the one presented in this section is also used to set the reference for the  $G_m$  stage within BLH block.

<sup>&</sup>lt;sup>7</sup>MSB and LSB stand for most and least significant bit, respectively.

is that the reference input impedance varies widely, so a low-impedance reference must be used. Also, the switches operate from ground to  $V_{ref}$ , restricting the allowed range of the reference.

CMAD implementation employs Low Drop-Out regulators (LDOs) for setting the reference voltage and bias current of the D/A together with some other blocks. The technology used  $(0.35 \ \mu m)$  has relatively a high analog performance compared to recent low feature size technologies, so the amplifier compensation is easily achievable for the whole operation range. Relatively a high accuracy of matching is also feasible with proper layout.

Performance of interest does not relate to high speed processing measures like SFDR<sup>8</sup> and IM<sup>9</sup> distortion, thus integral and differential non-linearities (INL and DNL) are of interest. Since different circuit architectures have different behaviors for these two metrics, a topological comparison between different circuits is considered useful making a better decision.



Figure 3.11: Binary weighted (left) and thermometer coded (right) architectures for ideal comparison.

Fig. 3.11 shows an example of two different architectures for INL and DNL comparison. For comparing binary weighted and thermometer coded architectures, BWA and TCA respectively, a C++ code is developed in ROOT environment [82]. N being the number of bits,  $2^N$  unit current sources are created. The currents they provide are acquired randomly from a Gaussian distribution with a sigma of 0.02

<sup>&</sup>lt;sup>8</sup>SFDR stands for Spurious Free Dynamic Range.

<sup>&</sup>lt;sup>9</sup>IM stands for Inter Modulation.

separately for each unit current source. These sources are used to form both the D/As in BWA and TCA.

In BWA, first  $2^{N-1}$  unit current sources are summed to form the Most Significant Bit (MSB), then similarly the sum of next  $2^{N-2}$  unit sources are used as the next bit to MSB and so on. In TCA, every increment in input digital code switches an additional unit current source to the output, therefore it needs a binary-tothermometer coded converter to properly drive the unit current sources as both of the architectures are seen in Fig. 3.11. Fig. 3.12 shows the simulation results. Coherent with the theoretical expectations [83], even though INL variations of both the architectures are almost identical and equal to  $\sqrt{2^N}\sigma = 32\sigma$ , TCA represents a 32 times ( $\sigma$ ) better DNL behavior than BWA does ( $32\sigma$ ).

An intuitive explanation to the above observed result would be the following: in the case of MSB transitions of BWA, a big current is turned off *while* another big current is turned on, leading to possibly a big difference, considering the fact



Figure 3.12: Comparison results for 10-bits BWA and TCA showing the INLs (left column) and DNLs (right column) for both the architectures.



Figure 3.13: 10-bits transistor-only R-2R architecture.

that both of the currents are formed by summing up individual contributions of unit current sources. Sum of all the standard deviations of individual current sources also contribute to the final result, leading to a more ambitious step difference. However in case of a TCA architecture, every increment in digital word switches only a single current source to the output, therefore standard deviation of only a single current source contributes to the resulting step, leading to a better DNL behavior. High performance hybrid D/As can be constructed by utilizing more than one architecture for different bits to benefit from advantages associated with them. A similar analysis has been detailed for an implementation of 10-bits segmented D/A in [83].

An important concern is also the output impedances driving the comparator, namely the outputs of the shaper and the D/A. For proper functioning of the comparator input stage which is basically a differential pair, it is desired to equalize these impedances. The shaper has a low output impedance, thus requiring the same for the D/A.

Concerning the above discussion, transistor-only current-mode R-2R architecture is a suitable solution as it is composed of only transistors that are compact and that consume very low power. Fig. 3.13 and Fig. 3.14 show conceptually the architecture of the small area-low power 10-bits D/A used for setting the threshold of the comparator [85] and its layout, respectively. In such an architecture, transis-December 17, 2007



Figure 3.14: Layout  $(140 \times 620 \mu m^2)$  of 10-bits transistor-only R-2R D/A where the thick yellow layers on the top and on the bottom show the channel boundary.

tors in the ladder do not necessarily emulate identical resistor values but instead, successful operation is based on linear current division principle [69]. The accuracy of the division technique used is based on the characteristic I-V curve matching of the two transistors but not on their linearity [73].

In Fig. 3.13, the output voltage,  $V_{out}$ , is dependent on the current flowing through the feedback resistor,  $R_f$ , such that

$$V_{out} = -i_{TOT} \cdot R_f \tag{3.7}$$

where  $i_{TOT}$  is the sum of the currents selected by the digital input as:

$$i_{TOT} = \sum_{k=0}^{N-1} \frac{D_k \cdot V_{ref}}{2 \cdot R \cdot 2^{N-k}}$$
(3.8)

Power consumption of the D/A in Fig. 3.13 is approximately 1.1 mW including the opamp. A current mirror implementation with the same functionality and power consumption would exhibit a much larger area. CMOST-only R-2R core operates with a current ranging from 20 to 50  $\mu A$  which is negligible compared to the one consumed by the opamp.

An important practical issue in contrast to the case of the resistive ladder is that the small-signal equivalent resistance seen between the drain and source terminals of the MOSFETs is not identical throughout the ladder. This leads to a mismatch error in current division and degrades the maximum resolution achievable. The transistor length has a lower limit determined by the matching accuracy of the given technology and the resolution that has to be reached. But there are more severe constraints



Figure 3.15: The opamp used as the output current-to-voltage converter in the D/A.

in choosing the length. Even if the matching is perfect, the ladder will not deliver perfectly binary weighted currents. The reason is as follows.

The ladder is a sequence of MOS transistors in parallel and in series. Although the MOSFETs operate in the linear region, the relationship between the *lateral field* and the *carrier velocity* is nonlinear, at least for the MOSFETs on the MSB side which operate in the vicinity of the saturation region. Since the drain-source voltages change throughout the ladder, the strength of this effect differs for every MOS device in the ladder [73]. To keep the errors stemming from these effects small, a large transistor length has to be chosen.

Another issue relating to the accuracy of the output voltage of the D/A is the offset of the opamp. Fig. 3.15 and Fig. 3.16 show the implementation of the D/A output opamp and its layout, respectively. Even though it is a basic two-stage amplifier with no offset cancellation scheme, the inter-digitized layout is fairly symmetric, leading to lower offset levels.

Seen in Fig. 3.13, the accuracy of the current division at LSB depends on a **December 17, 2007** 



Figure 3.16: Layout  $(80x240\mu m^2)$  of the opamp used as the output current-tovoltage converter in the D/A.

requirement that both input voltage levels of the opamp are equal. In case there is an offset between these two inputs, the ladder would have an unbalanced current division chain resulting in a poor differential behavior. This error depends on the digital word applied to the D/A converter and is thus signal dependent. This means that distortion is added to the output signal. It can be shown that the error current due to offset is bounded by [12]

$$\frac{|V_{off}|}{4R} \le \Delta I \le \frac{|V_{off}|}{R} \tag{3.9}$$

where R is the equivalent resistance of the ladder and  $|V_{off}|$  is the absolute value of the offset voltage of the operational amplifier.

Concluding the above discussion actually based on a set of  $MC^{10}$  simulations performed on the transistor level implementation, the D/A represented in Fig. 3.13 turned out to be either sensitive dominantly to the mismatch between the current divider transistors or to the offset of the output opamp. In practice, both of the above effects degrade the D/A performance depending on the parametrization of the devices forming the ladder and input stage of the opamp.

Apart from improving the opamp behavior via applying an error trimming scheme or re-sizing the current divider transistors within the ladder, one can use different  $V_{ref}$  voltages setting the virtual ground of the circuit to find an optimum compromise between the above stated issues.  $V_{ref}$  sets the drain-source potential of the

<sup>&</sup>lt;sup>10</sup>Monte Carlo.

current dividers and change their resistances and the mismatches among each other. Since a lower  $V_{ref}$  would result in a larger  $V_{gs}$  of the current divider transistors, it would help improving the mismatch distribution, however it also decreases their resistances, thus making the ladder more sensitive to the offset of the opamp. On the other hand, a higher  $V_{ref}$  would degrade the mismatch between the current divider transistors via lowering their overdrive voltage, but at the same time it increases their resistance making the whole ladder less sensitive to the offset of the opamp.

As an alternative solution, the gate voltages of the transistors forming the current dividers can be controlled in an independent manner to find an optimum. However this brings the necessity that the D/A input logical levels must also be equal to that value [73] for proper operation, complicating the overall circuitry.

Monte Carlo simulations are extensively performed both for mismatch-only and process-and-mismatch to find the optimum biasing levels without any trimming. Fig. 3.17 shows, for complete input scanning, some worst case MC simulation results. The MSB transitions, seen in the middle of the curves, can show jumps which can be a few times the LSB. Therefore, MSB transition is identified as the region not to be used for setting the threshold of the comparator. This does not impose any difficulty in practice due to two reasons. Firstly the D/A will not be used for dynamically change baseline or threshold on the fly, but it will be assigned a value and remain the same during the operation. Secondly two different D/As, which are identical, set both the baseline at the output of the shaper and the threshold of the comparator. Therefore in operation, staying away from the MSB transition region is easily achievable.

The MSB jump is actually expected from the ideal simulations performed for comparing architectures. As seen in Fig. 3.12, in right-bottom sub-plot, the differential non-linearity has a large jump in the middle corresponding to the MSB transition. This is the result when no compensation scheme is applied in a binaryweighted architecture.

The main concern, for the D/A operation, is the stability of the output levels as they are going to generate references both for BLH of the shaper and for the threshold of the comparator. Fig. 3.18 shows, as an example, one of the MC simulation December 17, 2007



Figure 3.17: MC full scan.

December 17, 2007



Figure 3.18: MC simulation result for the MSB transition.

December 17, 2007

results representing the variation of the output level. A standard deviation of 1.54 mV in step size and 3.53 mV in output step height shows what one must expect from the actually fabricated chips. These values represent the worst-case condition of the transition from 0111111111 to 1000000000 and are still within the noise level of the channel operation. It must be noted that the MC simulation result in Fig. 3.18 shows that, with a very small possibility, the D/A can loose its monotonicity as one out of 50 runs resulted in a negative step size. These MC results are dominantly affected by the mismatches between the devices.

It should be noted that the worst-case corners are different than the worst-case results selected from MC simulations. Especially the mismatch between the devices has severe influence in MC whereas it does not in worst-case corners. This is because in worst-case corners, the simulators assume the "worst" or the "best" parameter sets identically for "all" the devices at the same time. Thus all the devices have the same set of parameters and therefore the effect of mismatch is not included in the results. On the other hand in MC simulations, all the parameters of all the devices are randomly extracted from related distributions, leading to a better representation of mismatch between the devices. Some devices tend to be better according to related distributions and some do the opposite, therefore running enough number of MC simulations and isolating worst results leads to more realistic expectations for worst-case scenarios.

For comparison, Fig. 3.19 shows the worst INL and DNL of the D/A implemented. Even though the MSB region has the worst INL behavior which is associ-



Figure 3.19: Worst case INL and DNL in corner analysis.

December 17, 2007

ated to the architecture, both INL and DNL are still acceptable, since INL is smaller than one LSB and DNL is smaller than half the LSB. Therefore it is evident that for the specific circuit, corner analysis result conflicts with the MC worst condition.

#### **On-Chip Biasing**

On-chip biasing is implemented via reference sources based on LDOs driven by bandgap voltage sources, as seen in Fig. 3.20 and Fig. 3.21. Linear voltage regulators use an active pass element (MP) to reduce the input voltage (Vdd) to the regulated output voltage ( $V_{out}$ ). Linear voltage regulators force a fixed voltage level to appear at the output terminal [70]. In Fig. 3.20, assuming high enough opamp gain, the output voltage is given by

$$V_{OUT} = V_{REF} \cdot \frac{R_1 + R_2}{R_2}$$
(3.10)

where  $V_{REF}$  is the reference provided by the band-gap source.

The LDOs implemented for CMAD are optimized for sub-circuit requirements and consume 0.9 mW from 3.3 V single source. Additional pads are also provided for the flexibility of disabling the on-chip LDO reference to be able to apply external sources. As an example, this is one of the ways that the LSB, thus the resolution of the D/As can be adjusted for different user conditions.



Figure 3.20: LDO voltage reference.

Fig. 3.22 and Fig. 3.23 show the implementation of the opamp-less band-gap reference driving the LDOs and its layout, respectively. Band-gap voltage references **December 17, 2007** 



Figure 3.21: Layout  $(260 \times 140 \mu m^2)$  of LDO voltage reference.

combine the positive temperature coefficient  $(TC)^{11}$  of the thermal voltage with the negative TC of the diode forward voltage (the band-gap energy,  $E_g$ , of silicon decreases with increasing temperature) in a circuit to achieve a voltage reference with an effectively zero TC. Once one has a temperature-independent voltage, it is a simple matter, e.g. with the use of an opamp, to generate multiples of it. The reference voltage output, seen in Fig. 3.22, is given as

$$V_{out} = V_{BE3} + 5V_T ln(n)$$
(3.11)

where  $V_{BE3}$  is the base-emitter potential difference of Q3,  $V_T$  is the thermal voltage and n (=33) is the area ratio between Q2 and Q1. The numerical value of 5 is specific to the schematic and it is the ratio between the only two resistors. Band-gap in Fig. 3.22 consumes 93  $\mu W$  from 3.3 V single source.

CMAD is a multi-channel ASIC which requires an identical biasing scheme. Fig.

<sup>&</sup>lt;sup>11</sup>The temperature coefficient or TC is the relative change of a physical property when the temperature is changed by 1 K. Positive and negative TC refer to the direction of change.



Figure 3.22: Implemented opamp-less band-gap reference.



Figure 3.23: Layout  $(135 \times 300 \mu m^2)$  of opamp-less band-gap reference.



Figure 3.24: Implemented D/A biasing scheme.



Figure 3.25: Layout  $(120x320\mu m^2)$  of D/A biasing scheme.



Figure 3.26: Alternative D/A biasing scheme.

3.24 and Fig. 3.25 show the architecture of current sink driving the D/A as an example and its layout, respectively. Here, the only opamp adjusts the gate-source potential differences of the transistors all together keeping the value of the current flowing through them constant as

$$I_i = \frac{V_{REF}}{R_2} \tag{3.12}$$

where  $V_{REF}$  is the reference voltage provided by the LDO source. There are two possible choices: having a single opamp-resistor couple and relying on transistor matching, or alternatively multiplying the opamp-resistor couples and relying on resistor matching which also comes with a bigger area as seen in Fig. 3.26.

Fig. 3.27 shows a MC simulation result for the circuit of Fig. 3.24 showing the difference of the currents flowing through two different branches which are arbitrarily selected. The D/A current of the branches is set to  $25\mu A$ . The difference has a standard deviation of 80nA and is approximately 500nA peak-to-peak as a result of 200 runs.

On the other hand, Fig. 3.28 shows a MC simulation result for the circuit of Fig. 3.26 showing the difference of the currents flowing through two different branches which are arbitrarily selected. The D/A current of the branches is set to  $25\mu A$ . The difference has a standard deviation of 231nA and is approximately  $1.5\mu A$  peak-to-peak as a result of 200 runs.

As seen from the above MC simulations, relying on transistor matching results in a narrower mismatch distribution motivating the current choice, depicted in Fig. 3.24.

A practical issue relates to the matching between the resistors in the current sink biasing the D/As, namely  $R_2$  of Fig. 3.24 and the one used at the output of the D/A, namely  $R_f$  of Fig. 3.13. Both the absolute values of these resistors are important for the output DC level of the D/A. In case there is a process variation affecting the absolute value of  $R_f$ , one would like also  $R_2$  to be affected accordingly so that the ratio of  $R_f/R_2$  remains the same. This is because in case of a higher (lower)  $R_2$ , the current flowing through  $R_f$  would be lower (higher) resulting in a lower (higher) D/A output. To compensate for this,  $R_f$  must also have an accordingly high (low) value keeping the D/A output at its nominal level, since the variation is linear as **December 17, 2007** 



Figure 3.27: MC simulation result showing the current difference distribution of two arbitrary branches of Fig. 3.24; both process variations and device mismatches are included.



Figure 3.28: MC simulation result showing the current difference distribution of two arbitrary branches of Fig. 3.26; both process variations and device mismatches are included.

I = V/R. Any mismatch between the two resistors would result in erroneous output which must be minimized.

#### Comparator

Fig. 3.29 shows the implemented comparator stage applying the *cut* on the output of the shaper. It consists of a series of the identical gain stages, denoted as G, and the feed-back, H, establishing the hysteresis by unbalancing the steered current from one branch to the other. The output buffer, B, is followed by the one-shot block, not seen in the figure, which enables setting either a fixed length digital signal or a variable one for the so called *time-over-threshold* capability. Time-over-threshold is defined as the width of the discriminated signal above the threshold level, thus it gives a first order estimate on the amount of charge left by the detected particle.



Figure 3.29: Transistor level implementation of the comparator.

# Chapter 4

# Test of The CMAD

This chapter briefly presents the CMAD prototypes together with the major modifications made in successive fabrications. A summary of the relevant measurement results which are coherent with the simulated behavior will be given for the production version of the ASIC. An overall comparison to the existing FE system will conclude the chapter.

### 4.1 Prototypes

In the framework of the project, 2 prototypes were fabricated before the production version denoted as the CMAD throughout the text, namely CMADv1 and CMADv2. The full functionality is reached gradually and the new abilities are introduced at each prototyping cycle.

CMADv1 is the prototype to verify the basic front-end functionality. The gain control and the D/A were not included in this chip.

In CMADv2 which is the second submission, program-ability to the gain of the CSA is introduced. Full processing chain is put together in this prototype with a single 10-bits D/A per channel to control the threshold voltage of which the actual resolution is 8-bits. Two most significant bits of the D/A are moved together with the baseline, so they do not contribute to the threshold resolution. The 8-bit D/A used in the second submission does not have a direct output pad which makes it impossible to measure the threshold directly. The only way to systematically test



Figure 4.1: Additional grounding/monitoring lines.

the D/A functioning is the so-called *threshold scanning* which is a quite common technique in nuclear and HEP experiments.

The production system is based on the experience gathered out of the previous prototypes. The *CMAD* has two independent D/As with a dynamic range of 10-bits, one controlling the threshold of the comparator and the other controlling the baseline itself. This provides a more flexible operation. The D/A setting the threshold of the comparator has a direct connection to an output pad for monitoring purposes. A dedicated line connects the output of the D/A to an output pad enabling testing, which is otherwise connected to ground establishing an additional in-between grounding to decrease the cross-talk between successive channels as seen Fig. 4.1. Thick squares on the left hand side of the figure represent the input pads where only two channels are visible. The switches, denoted as  $S_g$  in the figure, control the function of the line, either monitoring or additional grounding. This last prototype is needed because of an offset problem between the channels as the details are given in Chapter 7.



Figure 4.2: Conceptual CMAD test setup.

## 4.2 Test Setup

Fig. 4.2 and Fig. 4.3 show the generic test concept used for the CMAD measurements and the actual configuration, respectively. The setup, which has not been changed much throughout the prototypes, consists of a stimuli provider which is controlled by a fixed frequency trigger generator. It is realized either by a simple electronic pulser or a PMT tube which is driven by an LED light source to imitate the actual RICH operational environment. The generated stimuli are then sent to the test board<sup>1</sup> on which the CMAD resides. After the signal processing performed by a single CMAD channel, the read-out system<sup>2</sup> takes over and sends the data which are ready at the output of the CMAD channel to a CAMAC controller via an optical fiber<sup>3</sup>. Finally the data are monitored by an on-line monitoring application to calculate related statistics.

## 4.3 Measurement Results

Measurements showed good agreement with simulation results. Gain and the output pulse shape of the preamplifier is adjustable by controlling the values of capacitive and resistive components in its feedback path. These component values can be set

<sup>&</sup>lt;sup>1</sup>Horizontal card in Fig. 4.3

 $<sup>^2 \</sup>rm Vertical$  card attached to the CMAD test board in Fig. 4.3

<sup>&</sup>lt;sup>3</sup>Orange cable leaving the readout card at its right-top corner in Fig. 4.3.



Figure 4.3: The CMAD test setup.

either independently or in a correlated manner in order to preserve the shape of the output signal.

The measured gain of the preamplifier is a strong function of the resistor as seen in Fig. 4.4. In order to keep the signal with the optimum shape, the capacitor should be adjusted in such a way that the time constant remains the same. Capacitor value has only a slight effect on the preamplifier gain. It is utilized to adjust the time constant but not the gain itself. An increase in the binary code for resistor must be accompanied by an increased capacitor code. Reverse logic is used internally in the chip to preserve the direction of the digital code change maintaining the optimum signal shape.

For test purposes in one of the channels of CMADv2 prototype, the output of the shaper is not connected to the input of the comparator but to an output pad to directly probe for linearity measurements of the preamplifier output which is important for proper operation. Fig. 4.5 shows the measurement results. In the upper plot, circles represent the normalized preamplifier output values and the solid line is the linear fit. The non-linearity is less than 2%, as seen in the residual between



Figure 4.4: Measurement results for adjustable gain of the preamplifier as a function of R and C binary D/A converter inputs.



Figure 4.5: Gain linearity of the preamplifier; the measurement and the linear fit (upper plot) and the difference between fit and measurement.

the data and the linear fit given in the bottom plot of Fig. 4.5.

After choosing an appropriate CSA gain and observing the sufficient linearity of the front-end consisting of CSA, Shaper, and BLH system, one would be interested in whether the threshold which is set by the user is correct. In the CMADv2 prototype, the D/A driving one of the inputs of the comparator has no output pad for direct probing, thus an indirect method is needed to test proper functioning of the D/A.

As a standard technique in this type of read-out chains, the so called *threshold* scanning was performed to observe the correspondence between what is set and the actual threshold. Therefore a noise figure can be calculated relating to the precision of the overall binary processing chain. In a threshold scan measurement which is especially suitable when the number of D/As is large enough as it is usually the case for trackers and ring imaging type detectors, threshold of the comparator is set to a certain value while maintaining a "white" stimuli at the input. White stimuli is actually a set of signals covering all the dynamic range of the processing chain, thus the expected result is a plateau which starts just after the threshold that is set by the user.

The rising and the falling edges of the plateau is assumed to approximate to the *sigmoid function* given as

$$P(t) = \frac{1}{1 + e^{\pm t}} \tag{4.1}$$

which is a special case of the so called *logistic function* given as

$$P(t; a, m, n, \tau) = a \cdot \frac{1 + me^{t/\tau}}{1 + ne^{t/\tau}}$$
(4.2)

where a, m, n, and  $\tau$  are real values. A logistic function or logistic curve models the *s*-curve of growth of some set P. Fig. 4.6 shows the s-curve (upper plot) with the terminology as applied to threshold scanning measurement and its derivative (bottom plot). The initial stage of the growth is approximately exponential; then, as saturation begins, the growth slows, and at maturity, it stops.

In the threshold scan measurement, the set denoted as P(t) is the number of counts per input threshold value and the derivative of the *growth* is interpreted as the *noise* of the overall system.

Ideally s-curve should have a *step* shape, i.e. the signal at the input of the **December 17, 2007** 



Figure 4.6: S-curves (upper) and their derivative (bottom).

chain is either cut or is passed to the read-out system, depending on the threshold value set and input signal pulse height. Thus the system is called a binary read-out system, but the uncertainty or equally the noise of the channel causes the ideal discrete situation to become a smooth transition. Therefore the derivative of the rising and falling edges of the plateau is interpreted as the precision of the chain (ideally delta function). The FWHM<sup>4</sup> of the derivative is treated as the noise level of the processing chain, depicted in the bottom of Fig. 4.6.

The test is performed as follows: constant input stimuli are applied and the threshold is scanned starting from 0 to 1023 and the plateau is generated. Fig. 4.7 depicts the situation where both the expected plateau (upper) and its derivative are plotted. The distance between the two centroids of Gaussian shaped derivatives is expected to be equal to the input pulse height. The input pulses have known heights. Together with the knowledge of the correspondence between the digital D/A input code and the resulting threshold<sup>5</sup>, the distance between the two centroids can be calculated. Therefore the noise of the overall processing chain can be evaluated for different conditions.

<sup>&</sup>lt;sup>4</sup>FWHM stands for full width at half maximum which is interpreted as the thickness of a Gaussian distribution.

 $<sup>^{5}</sup>$ This is also named granularity.



Figure 4.7: Measurement of the channel noise.

Fig. 4.8 shows a single threshold scan measurement where the number of counts is shown on the y axis and the threshold value is shown on the x axis. The first plot shows the expected plateau starting just after the threshold value corresponding to the digital word driving the D/A. The second plot shows the same distribution without input stimuli and with a threshold very close to the baseline, thus having a peak due to the noise. The last plot presents the measurement result acquired with a photo-multiplier tube driven by an LED imitating Cherenkov radiation; there is not a cut-signal-region, since the input signals (or stimuli) are larger then the largest threshold value settable.



Fig. 4.9 shows, as an example, a threshold scan measurement to calculate the

Figure 4.8: A threshold scan measurement.



Figure 4.9: Measured channel noise of a CMAD channel.

December 17, 2007



Figure 4.10: Channel efficiency measurements both for the CMAD and the MAD-4 as a function of event rate.

channel noise. The measurement is performed with an LSB size or equivalently a granularity of 0.5 mV/digit, at the D/A which sets the comparator threshold. The smooth nature of input signal cutting is evident based on the plots on the bottom (expected plateau). However the derivatives belonging to the rising and the decaying regions differ as evident from the corresponding upper plots. According to this specific measurement result, the noise of the channel (i.e. the ambiguity of the binary decision made by the channel) is less than 5 mV or equivalently is less then 5 fC, considering the gain of the CSA stage.

Finally the last level of measurements relates to the efficiency. Designed to match the specific features imposed by fast multi-anode photo-multipliers that guarantee full efficiency up to 5MHz/Ch, the CMAD has to sustain the same event rate in order to overcome the limitation of MAD-4 at 1MHz. Fig. 4.10 shows the efficiencies of the CMAD and the MAD-4 as a function of event rate, demonstrating that the new ASIC can effectively provide a higher rate, being more than 90% efficient at 6MHz/Ch. The CMAD data in Fig. 4.10 are acquired both for the slew rate limiting buffer (SRLB), seen in Fig. 3.4, enabled (filled circles) and disabled (empty circles) cases. It must be noted that the vertical difference between empty and **December 17, 2007**  filled circles gets larger as the input trigger rate increases. Even though the effect of SRLB is ignorable at low input trigger rates, it gives rise to a higher processing speed at higher frequencies, experimentally confirming the design motivation given in the previous chapter.

Table 4.1 summarizes the CMAD properties whereas Fig. 4.11 shows the chip layout of the CMAD which is the production version.

| Table 4.1. Froperties of the CMAD. |                             |
|------------------------------------|-----------------------------|
| Technology                         | $0.35~\mu m$                |
| Number of Channels                 | 8/Chip                      |
| Preamplifier Gain Range            | 0.4-1.2 and 1.6-4.8 $mV/fC$ |
| Preamplifier Gain Resolution       | $0.1 \ mV/fC$               |
| Peaking Time                       | 10 <i>ns</i>                |
| Processing Speed                   | >5 MHz/Ch                   |
| Chip Size                          | $4.8 \text{x} 3.1 \ mm^2$   |
| Power (w/ LVDS Drivers)            | 26 mW                       |

Table 4.1: Properties of the CMAD.



Figure 4.11: Layout  $(4.8 \times 3.1 mm^2, \text{ with pad-ring})$  of the CMAD.

December 17, 2007

Page intentionally left blank.

# Chapter 5

# CP-PLL Based Serializer for the GBT System

#### 5.1 Introduction

Large Hadron Collider (LHC) at CERN is the longest (27 km) circular accelerator worldwide or, equivalently for now, the particles within have the highest energies (of the order of a few TeV). As being the frontier in the field, the four main experiments located on LHC, namely ALICE, ATLAS, CMS and LHCb, aim at studying the basic constituents of matter and their interactions with a design luminosity<sup>1</sup> of  $10^{34} cm^{-2} s^{-1}$ . The LHC experiments have involved physics programs which in total cover a wide range of physics goals. The programs range from Higgs search to verification of the theory of super-symmetry, together with confirming previously-observed phenomena but this time in different energy ranges. A possible confirmation of the existence of Higgs boson and of super-symmetric particles might be explanations to the mechanism of electro-weak symmetry breaking which gives masses to particles and to the issues of dark matter and dark energy, respectively. The locations of the main experiments on the LHC *ring*, together with the main pre-LHC accelerator stages, are depicted in 5.1.

<sup>&</sup>lt;sup>1</sup>The measure, *luminosity*, is the number of particles within the collisions or equivalently the average *brilliance* of the collisions.



Figure 5.1: LHC accelerator stages (not to scale) and main experiments.

Nuclear and HEP experiments of such sizes need a correspondingly large communication infrastructure to operate. It is typically required to have three concurrent systems: Data AcQuisition (DAQ), Timing Trigger and Control (TTC) and Slow Control<sup>2</sup> (SC). These systems have very different requirements in what concerns the data transmission bandwidth of their links.

Proper timing of electronic signals is essential where detector systems are composed of different sub-detector systems which must be cooperated. Monitoring of detector status, adjusting detector parameters, physics event selection or triggering and data acquisition depend on proper timing of operations.

The next sections summarize the current status of such a system developed for the LHC and continue with the upgrade to SLHC for which the PLL based serializer is developed within the framework of GBT project.

## 5.2 The TTC System

The Timing, Trigger and Control (TTC) system is the distributor of the fast timing signals at the LHC [77]. The timing signals generated by the LHC radio frequency (RF) generators have to be distributed to all experiments and to the beam instrumentation. In the system, timing signals are conveyed from the RF generators to the LHC central control room via single mode optical fibers which are approximately 10 km long. The control room is the star-topology distribution point to the experiments: these connections are via single mode optical fibers, at 1310 nm, and the lengths range from 3 to 10 km depending on the location of the experiment. Once at the experiment sites, additional trigger and control information are joined to the timing signals and distributed through the TTC system. At the experiment level, the trigger acceptance is generated, reset commands for registers are sent and decisions are made regarding a sub-detector mode (i.e. test or calibration).

<sup>&</sup>lt;sup>2</sup>e.g. detector and experiment control systems, DCS and ECS.

#### 5.2.1 Timing

The LHC beam is not continuous but constitutes a series of bunches, that are groups of particles which move together through the accelerator, as seen in Fig. 5.2. The bunch spacing is 25 ns or about 7.5 m, which corresponds to an accelerator operation frequency of approximately 40.08 MHz. Moreover, the bunch filling is not continuous as well due to issues related to the accelerator operation, and only three fourth of bunches will be present, corresponding to a number of approximately 2800 [21].

The TTC distributes the bunch clock and the orbit signal, which allow precise identification of the event number. They are derived from the LHC RF generators and their frequencies vary slightly during acceleration, as they are synchronous to the circulating beams. The timing properties of these signals are critical as the detector electronics, the DAQ system and the beam instrumentation work synchronously to the machine and consequently rely on those signals for detector synchronization, trigger system alignment, assignment of bunch crossing to data, and pipeline synchronization. Thus, the delays associated with the different signal paths and the jitter properties of the signals are to be strictly controlled.



Figure 5.2: Bunch structure.
### The TTCrx

The TTCrx [13] [14] is the radiation-hard ASIC receiver which acts as the interface between the TTC system and the detector front-end electronics. It is composed of a full custom part for the analog and timing critical functions, plus a standard cell implementation for digital logic and non-time-critic functions. Within the full custom part, clock and data are recovered and a fine de-skewing function is implemented. The digital part of the chip contains several internal registers used for control and monitoring.

The TTCrx receiver is equipped with all signals necessary to synchronize the detectors. The 40 MHz LHC clock is extracted from the serial data stream and fed into two independent high-resolution phase shifters which provide a fine programmable delay in steps of 104 ps between 0 and 25 ns. An additional coarse delay register allows a compensation range of up to 16 bunch-crossing intervals, which can be used to compensate for the propagation delays associated with the detectors and their electronics. The bunch counter and event counter registers keep track of bunch and event collision numbers.

The timing requirements of the TTC system are strict: an additional ASIC component has to be used as a complement of the TTC system in all the situations where the TTCrx clock jitter proves to be excessive. The quartz crystal based phase-locked loop (QPLL [15]) can reduce the jitter level to a cycle-to-cycle jitter of 22 ps r.m.s. from 76 ps r.m.s. at the output of the TTCrx.

### 5.2.2 Trigger

In Fig. 5.2, particle bunches approaching to the center of the detector for a collision are shown. It is also shown how the collisions between the two bunches are in fact an event of discrete type, where a collision of two *particles* (i.e. protons and/or heavy ions) are actually collisions between the partons (i.e. quarks and gluons) which form them. Only a head-on collision of two partons can have enough energy to give rise to an interesting event. Particle bunches cross at the rate of 40 MHz. Despite the large number of protons per bunch (i.e. of the order of thousands), due **December 17, 2007**  to the granular structure of the bunch, the proton collisions happen at a rate that is only of the order of 100 Hz. Additionally, a significant percentage of these collisions is not interesting from the point of view of new physics: they are either already known or *new* particles are in fact produced at a rate of the order of a few Hz, if any production takes place at all. At the same time, not all the produced data can be either driven out of the detector (due to bandwidth limitations), or stored (due to storage limitations), thus a very precise event selection policy should take place.

The data *abundance problem* is usually addressed in the experiments by the technique called *triggering*: the data are subsequently selected through various levels of decisions, called *trigger levels* (e.g. three levels in ATLAS and four<sup>3</sup> levels in ALICE [26]). The first level trigger, or level-1 (L1) trigger reduces the rate from 40 MHz to 100 Hz. The L1 trigger's *accept* signal has important latency issues as it has to be *promptly* decided whether the data are to be processed further or to be dismissed immediately. Data from the sub-detectors are collected, transferred to the processors outside the detector. All these operations have to be performed within a few  $\mu$ s interval, as that is the time available before the information on the collision falls off a front-end pipeline and is thus lost.

## 5.2.3 Control

Two types of commands can be delivered with the TTC system [14]: broadcast commands and individually addressed data. Broadcast commands are used to distribute messages to all TTC receivers in the system. These messages are used for example to reset the event and the bunch counters. Individually addressed instructions are implemented in the TTC system to transmit user-defined data and commands over the network. Each TTCrx can be addressed independently as each one is identified in the distribution network by a unique 14-bits channel identification number. The individually addressed commands are either aimed at the TTC receivers themselves to control the receiver operation (i.e. regulating de-skewing), or the data are

<sup>&</sup>lt;sup>3</sup>i.e. considering the so-called *past-future protection* system as the fourth level.

intended for the external electronics.

Both the broadcast and the individually addressed commands are transmitted over the TTC network using the frame formats [15] depicted in Fig. 5.3. In both of the frames the first bit is set to zero as a *start of packet*, the second one defines the frame type ("0" for broadcast, "1" for individually addressed), the last one is set to "1" as *end of packet*. The information in the frames is protected through a 1-bit error correct 2-bits error detect *Hamming* scheme: 5 check-bits protect the 8-bits data packet for broadcast commands, 7 check-bits protect the 32-bits data packet for individually addressed commands.

### 5.2.4 Line Coding in TTC System

Two channels are time multiplexed and transmitted in the TTC system [14]. Channel A is reserved for the L1 trigger information only. Channel B is used for delivering the slow control information. Only one bit of information is delivered per channel per bunch crossing for a total of 80 Mb/s. The L1A is a 1-bit information, either "pass on" or "reject" the data, and is not protected by any error control scheme. The information on channel B is instead formatted over multiple bunch crossings to create the frames shown in Fig. 5.3.



Figure 5.3: Control data frames of broadcast (top) and individually addressed formats (bottom).

The two time-division multiplexed channels are bi-phase-mark encoded before transmission over the network. This line code consists of representing a logical "1" as a pair of different bits ("10" or "01") and a logical "0" as two equal bits ("00" or "11"). It thus requires a line frequency that is twice the data bandwidth. As every logical level at the start of a cell is inverted with respect to the level at the end of the previous cell, this encoding scheme provides a very high number of transitions, at least one every encoded bit that is sent. Moreover, the data stream is DC-balanced. The coding scheme is sketched in Fig. 5.4.



Figure 5.4: Time division multiplexed bi-phase-mark encoding.

# 5.3 The LHC Upgrade

At the time of writing, even though LHC is not operational yet, the HEP community has already started planning a luminosity upgrade of the accelerator which in turn requires detector and relevant equipment upgrades.

One of the motivations for such an upgrade is the fact that the statistical error in a measurement decreases proportionally to the square root of the number of measurements, thus, to divide the statistical measurement error by 2, 4 times as many measurements have to be taken. After approximately 5 years of LHC operation with full luminosity, an upgrade of the machine is foreseen to improve the precision within a reasonable time frame. Considering the time it takes to develop required components in HEP R&D processes, it is treated the right time to start such projects.

The upgraded machine is called Super-LHC or S-LHC. It is expected that this upgrade will increase the luminosity by a factor of 10 which, in turn, will lead to a 10 times the amount of data to be transferred from the detector to permanent data storage. This is definitely going to impact the detectors, which will have to be upgraded as well as the related electronics.

Another issue is that, even in case the LHC would stay as it is with no luminosity upgrade, such projects are still considered necessary because of the fact that after roughly 10 years of foreseen operation period, radiation damage in detectors, data

transmission fibers and relevant electronics would require an upgrade, anyway. In this case the upgrade would serve to increase the performance of the overall system leading to more precise results and to lower the maintenance costs.

## 5.3.1 Communication Physical Layer

Considering the LHC, optical links seem to be the natural solution for data readout due especially to their high inherent bandwidth, but also due to other properties such as galvanic isolation, low electromagnetic interference, low atomic number of the materials involved, and low cabling weight.

The detectors at the LHC will be subject to harsh radiation levels due to the high number of colliding particles and the high frequency of occurrence of collisions. The optical link components which will sit inside or in the proximity to the detectors have to be radiation hard. Radiation hardness is a concern for only a few applications, though, such as HEP, space missions and weaponry. Thus the radiation-hard components can not be easily bought off-the-shelf. The optical link components are often chosen among commercial products to minimize the customization required. Thus, they need to be severely tested and qualified for radiation-hardness forced by HEP experiments whereas the ASICs are designed manually and are too subject to the same set of tests. They employ radiation-hardness using non-standard layouts in order to guarantee functionality and reliability over 10 years of expected LHC lifetime.

In the framework of future luminosity improvements of the LHC, a new optical transmission system is being developed in which the link is bidirectional and adaptable to different link configurations and functionality. This new link is named Versatile Bi-Directional (VBD) link, while GigaBit Transceiver (GBT) is the name chosen for its transceiver ASIC. The VBD link upgrades previously designed systems, namely Timing, Trigger and Control system [77] (TTC), and relevant components such as the Gigabit Optical Link [78] ASIC (GOL). The new link uses a newer technology, thus a much higher speed compared to the existing one. Present TTC can process 2 bits per 25 ns whereas 120 bits will be available in the new system **December 17, 2007**  (corresponding to a data bandwidth increase from 80 Mb/s to 4.8 Gb/s), allowing many improvements to the functionality of the system. Moreover, the new system will be subject to higher error rates compared to the existing one due to exposure to higher levels of radiation in S-LHC [79] and due to the higher speed of the link.

# 5.3.2 Gigabit Optical Link - GOL

A transmitter ASIC GOL [78] which is capable of operating with two of the most common data transmission protocols was developed, so that commercial components can be used in the parts of the link that do not sit in the radiation environment. Fig. 5.5 depicts the architecture. The transmitter ASIC was designed using radiation tolerant layout practices that guarantee tolerance to irradiation effects to the levels necessary for the LHC experiments.

The transmitter ASIC performs the function of a serializer and can operate in four different modes that are a combination of two common transmission protocols (8b/10b or CIMT) and two data rates (0.8 Gbit/s and 1.6 Gbit/s). The data input comes from a data bus operating either as a 16 or 32-bits bus synchronously with the LHC clock (running at 40 MHz), resulting in data bandwidths of 640 Mbit/s



Figure 5.5: Architecture of GOL.

and 1.28 Gbit/s respectively, for serial data rates of 800 Mbit/s and 1.6 Gbit/s. Depending on the chosen line coding, either a G-Link or a Gbit Ethernet/Fiber Channel receiver can be used at the other end of the link. Once serialized, the encoded data can be used to drive either a laser, or a 50 Ohm line. In the case of the optical transmission, due to radiation effects, an increase in the threshold current of the laser diodes over the lifetime of the experiments is expected. To compensate for this, the laser-driver employs an internal modulator and a bias current generator that can be programmed to sink currents between 0 and 55 mA. In the GBT system, a similar functionality is implemented by two current mode D/As developed in the framework of this thesis.

# 5.4 Motivation for the Replacement of the Current System

As presented briefly in the preceding sections, the current TTC system [14] is an optical broadcast network which is used for fast timing and slow control distribution at the LHC. The system provides for the broadcast of fast timing signals through all the transmission stages from the RF generators of the LHC machine to the outputs of the timing receiver, the TTCrx.

The TTC system users experience some restrictive features that in most cases resulted from technological limitations at the time of the system development. These days, the perspective of an upgrade of the LHC and the common availability of deep sub-micron technologies can lead to the development of an extended functionality for timing, trigger and control system requiring the development of a new timing receiver, the GBT13 [10].

Some of the major drawbacks identified on the TTC system and/or on the TTCrx are as follows:

- 1. Transmission of a single trigger type,
- 2. Several bunch crossing periods are required to transmit broadcast commands and slow control data,

- The system is unidirectional. This required the late addition of an I2C network in order to control the TTCrx, necessitating the presence of an additional control path,
- 4. Although broadcast commands and slow control data are protected by error correction codes, the trigger data are not,
- If not synchronized with the TTC signal source, the TTCrx generates a random clock frequency. This is undesirable for purposes of system development and testing.

The drawbacks mentioned in the first two points can be avoided by increasing the transmission data rate that is 80 Mbit/s in the current system. This will allow sending complex trigger information, broadcast commands as well as individually addressed commands and data, during a single bunch crossing interval. Addition of a return path will address the third point and would allow the implementation of an efficient monitoring system not only of the state of the TTC system itself but of the detectors electronics. Such a system resembles very much a bidirectional data link with added features to implement the synchronization of the detectors. It is thus conceivable to implement a bidirectional link that could work either as a general purpose data link or as a dedicated timing trigger and control link. To implement such a scheme, the receiver and transmitter components must be radiation hard and tolerant to single event upsets. The use of error correction codes for data transmission must be considered.

# 5.5 GBT Transceiver

Today's particle detectors require high-speed<sup>4</sup> digital optical links for transmission of data between the sub-detectors and the data acquisition system. Typically, high speed data transmission is required for both the trigger system and data-readout system paths. Generally, those links are unidirectional with the transmitters located inside the detectors and the receivers situated in the counting rooms. Due

 $<sup>^4</sup>$ multi Gbit/s

to the proximity to the collision point, the transmitters will be subject to high levels of radiation doses over the lifetime of the experiments. Additionally, the large numbers of high-speed optical links required impose strict limits on device costs. These constraints are particularly severe in the case of the LHC experiments which have to handle an unprecedented amount of data (of the order of a few tens of peta bytes). Moreover, in trigger links, data have to be transmitted with constant latency and synchronously with the LHC 40.08 MHz reference clock. This is to facilitate data alignment at the receiving end, before the data are fed to the trigger processors. Although commercial optical links and components can be found that meet the bandwidth requirements of all of the LHC planned systems, those components generally have not been designed to withstand high levels of total dose. The few radiation hardened devices which exist on the market have prohibitively high prices when the large number of links<sup>5</sup> required is taken into account. It was thus considered necessary to develop a dedicated solution that could meet the very special requirements of the LHC environment.

An important issue is that only the ASICs which are within the high radiation area are supposed to be rad-hard where the others such as the ones in the counting room are not. Therefore the electronics residing in the counting room may be commercial, thus requiring the rad-hard full-custom ASICs to be compatible with commercial standards. To operate the transmitter ASIC with a standard receiver, there are some compatibility constraints on the design that have to be followed. Namely, data formats, data rates and coding schemes have to be respected.

Additionally for trigger links, the constant latency requirement imposes data rates that are multiples of the LHC master clock frequency. In most applications the detector systems require the transmission of 120 bits of data in a single LHC clock cycle, therefore the required bit rate is higher compared to the existing system and is 4.8 Gbit/s. To increase the bandwidth without paying a penalty on the detector's material budget, it is necessary to use fewer optical links at higher data rates rather than simply increasing the number of links.

 $<sup>^{5}</sup>$  of the order of 100K in total for the four LHC experiments



Figure 5.6: GBT based link architecture.

The development being proposed by the project will thus act as a DAQ, TTC and SC link, still living some flexibility to implement custom configurations. At the heart of such a link is the GigaBit Transceiver ASIC (GBT13). The global architecture of an optical link based on the GBT is shown in Fig. 5.6 [10].

In order to simplify the development, embedding in existing systems and maintenance of the links, the GBT interface is proposing to adopt, as the high level transport protocol, the *ethernet* standard. Moreover, to enhance system integration of the off-detector electronics, the GBT transceivers in the counting room will be implemented by FPGAs. This requires using a communication protocol between GBT transceivers that can be implemented in standard FPGAs existing today in the market.

As represented schematically in Fig. 5.7 [10], the GBT frame is composed of 120 bits which are transmitted during a single bunch crossing interval (i.e. 25 ns) resulting in a line data rate of 4.8 Gb/s. 4 bits are used for the frame Header (H) and 32 bits are used for Forward Error Correction (FEC). This leaves a total of 84 bits free for data transmission corresponding to a user bandwidth of 3.36 Gb/s. Four of these bits are reserved for the SC field, the TTC field is 16 bits wide and the "D" field is 64 bits wide, resulting in the following bandwidths: 160 Mb/s for SC, 640 Mb/s for TTC and 2.56 Gb/s for DAQ.

Fig. 5.8 shows the simplified GBT architecture. It implements characteristic analog and digital functions typically a transceiver employs like local clock gener-









Figure 5.8: Simplified GBT architecture.



Figure 5.9: Electrical transmitter architecture.

ation, clock phase/frequency extraction and locking, serializing/de-serializing, line coding and error correction, line and optical device drivers and digital interfaces for various external systems. The diagram does not pretend to be complete and only very general relations between blocks are indicated by arrows. Blocks without linking arrows represent functionalities which are common both to the transmitter and to the receiver.

A simplified block diagram of the electrical transceiver is shown in Fig. 5.9. At every master clock cycle (40.08MHz of LHC clock) data are presented to the transmitter inputs as a 120-bits word. After scrambling and line coding [81], the data are serialized and driven either into a line (Line Driver, LD) or into an optical fiber (Optical Driver, OD) with full speed. The receiver functions as the mirror image: a line or an optical receiver, denoted by LR and OR respectively in the figure, accepts the data, Clock and Data Recovery (CDR) extracts the right clock frequency and phase out of the incoming data and re-times it, de-serialize the full speed stream into several slower parallel branches and undo what the scrambler and line coder do to reconstruct the actual data word.

## 5.5.1 GBT Network Configurations

GBT transceiver is designed to be flexible enough to address the requirements of all the experiments that will use it. A number of network configurations are foreseen.



Figure 5.10: Broadcast network configuration.

Fig. 5.10 shows the broadcast network configuration. This configuration is close to that of the one currently being employed on the LHC TTC system architecture. Such a network operates in the trigger-continuous mode and it has the potential of allowing the broadcast of data to a large number of destinations (a maximum 1024 for the current TTC system). However, a high fan-out requires the use of high optical power sources. This will be certainly outside the GBT13 laser driver capabilities and a fan-out of 1-to-8 or at maximum of 1-to-16 is foreseen. As in the present TTC system, the latency has to be strictly constant so that it can be used for the distribution of timing signals. In this configuration, part of the bandwidth will be reserved to the broadcast of trigger specific commands while the remaining will be available for user data or slow control of the experiment.

Fig. 5.11 presents the broadcast network configuration with electrical fan-out. In both of the network configurations, blocks partitioned as Tx/Rx are the GBT transceivers. Downward and upward arrows represent the direction of the communication as from master (A in the figures) to slave (B in the figures) and vice versa, respectively. In both of the configurations, the master provides the master clock and the receivers frequency- and phase-lock their local clock generators to this master clock. When a slave is requested by the master to send data upwards, the only clock information it needs is the phase of the master clock but not the frequency. Since the receiver has already been locked, the frequency information is passed to the transmitter of the slave internally (the arrow going from Rx to Tx internally in the figures).

The limited fan-out of the previous broadcast topology can be overcome by using **December 17, 2007** 



Figure 5.11: Fan-out network configuration.

a mixed optical/electrical tree as represented in Fig. 5.11. In this case a master transmitter broadcasts optically to several destinations (typically to 8), which in turn, will retransmit the masters data to several other destination further down the tree. This topology involves passive optical power splitting with electrical regeneration. A moderate optical fan-out of 1-to-8 will be typical. When compared to the simpler broadcast network (fully passive) this topology adds latency due to the optical/electrical/optical regeneration steps.

The GBT transceiver foresees also other network topologies namely point-topoint, bidirectional 1-to-N/N-to-1 with different configurations.

# 5.6 PLL Based Serializer Design

Serializers convert parallel data to serial one; they take at least two "slow" data streams and output a "fast" one by merging<sup>6</sup> them properly. This is fundamentally needed when there is not a possible bus in between source and destination. For synchronous operation, the serializer must be able to address some strict timing

<sup>&</sup>lt;sup>6</sup>This is done via Time Division Multiplexing or TDM

requirements. Since the output has the highest speed, an acceptable small timing error in one of the "slow" streams may not be acceptable in the final "fast" stream. Concerning the required transmission rates of today (starts from a few Gbits/s), they are time-critic building blocks in which "simple" things become challenges, especially in a HEP environment.

Serializers often employ Phase Locked Loops (PLL) to ideally nullify the timing errors, especially the one between master and local clocks. A PLL could be treated as the analog heart of the transceiver. It produces and maintains a clock signal which is properly aligned with some master clock in order to minimize transmission errors via adjusting the output transition instances of the serializer.

Fig. 5.12 shows the overall architecture of the serializer. It consists of a 120bits master register, four 30-bits registers to divide the frame by 4, a frequency synthesizer consisting of a PLL with a feedback divider which is composed of two stages, one dividing by 4 and the other dividing by 30, thus a total division ratio of 120, four fast switches imitating 4:1 MUX functionality, a decision circuit (DFF) and a line driver.

The overall operation is as follows: at every rising edge of LHC clock,  $f_{LHC}$ , a 120-bits frame is loaded into the large register as a 120 bits word and introduced to the four 30-bits registers. At every rising edge of the *Load* signal, 30-bits parts are loaded onto 30-bits registers. PLL locks the local clock generator to the LHC clock and an output clock frequency of 120x40MHz=4.8GHz is produced<sup>7</sup>. Q1-Q4 quadrature clock phases control the switches and four parallel streams clocked with a speed of  $f_{BIT}/4$  are serialized. Final decision circuit clears the output stream by clocking it with full speed. Line driver as the last building block of the serializer feeds the data to the next stage on a transmission line.

Multiplexers use both of the edges of the clock which controls them. This manifests itself as output jitter in case the PLL output, or equivalently the output of the VCO, has duty-cycle error. Because of the PLL output duty-cycle error introduced by the differential-to-single-end converter, in the final design, 4 switches instead of

<sup>&</sup>lt;sup>7</sup>LHC clock frequency is a little bit higher than 40MHz, though.



Figure 5.12: Architecture of the serializer.

4:1 MUX are used. Considering radiation, SEU and SEL, the redundancy scheme of majority voting is used within the feedback clock divider.

### 5.6.1 PLL Architecture

Fig. 5.13 shows the conventional Charge-Pump Phase Locked Loop (CP-PLL) architecture adopted [92] and used within the serializer. It consists of the phase/frequency detector PFD which compares the feedback divider output to the reference clock *RefClk*, two current sources  $I_{cp}$  which pumps/sinks current into/from the low pass filter LPF, the voltage controlled oscillator VCO which outputs a clock signal whose phase and frequency are functions of the control voltage (i.e. LPF output) and a clock divider which is used as a clock generator for the serializer and which divides the local VCO clock by 120, closing the loop.

It is a *control loop* with a *plant* and a *controller* adjusting the behavior of the plant. The VCO consisting of Diff. VCO and D2S is the plant that is controlled by PFD generating the error signal, CP consisting of adjustable current sources  $(I_{cp})$ , LPF generating the control voltage for the VCO, and the feedback divider consisting of two dividers in series: %4 and %30. The divider also functions as the clock generator for the serializer. During the operation, PFD generates a digital error signal (*up* and *down*) depending on the phase/frequency difference at its inputs, that is, *RefClk* and the output of feedback divider. The digital error signal modulates the



Figure 5.13: Architecture of the charge-pump PLL.

current,  $I_{cp}$ , being pumped into the loop filter. The Loop filter averages this digital signal to be used as the control signal for the VCO. The VCO output is divided by 120 and fed back to the input of the PFD such that the CP-PLL outputs a clock signal which is 120 times the *RefClk* and is also phase-locked to it (Second-order behavior, see Appendix A).

### 5.6.2 Loop Parameter Selection

In this subsection, the basic charge-pump PLL model in transfer function form based on an assumption of small error (linear loop) and a narrow bandwidth as compared to the input frequency (continuous-time approximation) will be presented. A continuous-time approximation is not valid if the loop bandwidth approaches the input frequency where the discrete-time or sampled nature of the loop must be recognized. The theoretical and practical stability limits are also presented such that a proper choice for loop parameters could be made.

### The Model

Being a mixed-signal time-varying sampled feed-back system, PLL behavior is usually approximated and analyzed within the boundary of *the control theory*. Appendix A gives a practical review of second-order feed-back loop behavior.

The behavior of the CP-PLL shown in Fig. 5.13 can be characterized by the conventional transfer function form as

$$T(s) = \frac{\omega_n^2(\tau s + 1)}{\frac{s^2}{N} + 2\xi s \frac{\omega_n}{N} + \frac{\omega_n^2}{N}}$$
(5.1)

in which

$$\tau = RC_1 \tag{5.2}$$

$$\omega_n = \sqrt{\frac{K_o I_p}{2\pi C_1 N}} \tag{5.3}$$

$$\xi = \frac{\tau \omega_n}{2} \tag{5.4}$$

$$K = \frac{K_o I_p R}{2\pi N} \tag{5.5}$$

where  $\tau$  is the time constant of the loop filter in seconds,  $\omega_n$  is the natural frequency of the loop in rad/s,  $\xi$  is the damping factor, K and  $K_o$  are the gain of the loop December 17, 2007 in rad/s and the gain of the VCO in rad/s/V, respectively, and N is the feedback divide ratio. In such a loop, a VCO with a gain of  $K_o$  and a divide ratio of N can not be distinguished from a VCO with a lower gain of  $K_{vco} = K_o/N$  and no feedback divider. Therefore Eq. 5.3 and Eq. 5.5 can be rewritten respectively as:

$$\omega_n = \sqrt{\frac{K_{vco}I_p}{2\pi C_1}} \tag{5.6}$$

$$K = \frac{K_{vco}I_pR}{2\pi} \tag{5.7}$$

The quantities are interrelated by

$$K = 2\xi\omega_n \tag{5.8}$$

$$K\tau = 4\xi^2 \tag{5.9}$$

$$K/\tau = \omega_n^2 \tag{5.10}$$

where any pair of the three parameters un-ambitiously defines a linearized, timeaveraged control loop. As the above equations suggests, to the extend that the approximations are valid, the CP-PLL has exactly the same small-scale behavior as conventional PLLs with the same loop parameter values [93].

As [92] analyzes in detail, one of the great benefits of CP-PLLs is that they result in zero static phase error with a passive low-pass filter. Achieving zero static phase error in a conventional PLL requires an active filter with large DC gain (See the example PLL design in Appendix A).

However, practical circuits will impose some shunt loading across the passive filter impedance. Denoting this parasitic resistance as  $R_s$ , the actual *static phase* error or loop stress can be written as

$$\theta_s = \frac{2\pi\Delta\omega}{K_o I_p R_s} \tag{5.11}$$

where  $\Delta \omega$  is the frequency offset between the input signal and the free-running frequency of the VCO<sup>8</sup>.

Having a control loop, it is vital to evaluate the stability limits. Because of the switching nature, CP-PLL is a time-varying network where a simple transfer

<sup>&</sup>lt;sup>8</sup>Actually the output of the feed-back divider (%120).

function analysis is particularly not applicable. Only if the bandwidth of the loop is very small compared to the frequency of the input reference signal,  $\omega_n \ll \omega_i$ , then the approximation is supposed to be valid [93]. The stability limit of a discrete system is extracted from the z-plane representation of loop zeroes and poles as

$$K\tau = \left[\frac{\pi}{\omega_i \tau} (1 + \frac{\pi}{\omega_i \tau})\right]^{-1}$$
(5.12)

where  $K\tau$  is the normalized gain of the loop. Even though this is theoretically the upper limit that the loop operating point is allowed to reach, there are issues specific to CP-PLLs requiring attention.

For example ripple on the control voltage driving the VCO could cause instability. Upon each cycle of the PFD, the pump current  $I_p$  is driven into the filter impedance which responds with an instantaneous voltage jump, or *proportional term*, equal to  $\Delta V_{ctrl} = I_p R$ . At the end of the charging interval, or the period *integral terms* sumup, the pump current switches off and a voltage jump of equal magnitude occurs in the opposite direction. Frequency of the VCO follows these jumps such that its output would introduce frequency excursions equal to  $\Delta \omega_o = K_o I_p R = 2\pi K$ rad/s. Therefore, a practical stability limit or the so-called *overload limit* inherently manifests itself as

$$K\tau < \frac{\omega_i \tau}{2\pi} \tag{5.13}$$

which means the loop gain should be smaller than a certain value in such a way that the voltage excursions on the control line driving the VCO can not cause frequency excursions which exceed the input reference frequency at the VCO output. In practical applications, loop parameters are arranged such that the loop operating point can not exceed 10% of overload limit, to have some margin.

Fig. 5.14 shows the theoretical (z-plane) and practical (overload) stability limits. It is apparent that the actual restriction on the loop gain is the overload limit which sets in at a lower value of gain than does the theoretical stability limit for any practical circuit.

Considering continuous-time approximation, however, the loop is unconditionally stable (which is obviously not the case in real life) as seen in Fig. 5.15 showing the Bode plots of the transfer function of Eq. 5.1 for a specific over-damped parametriza-

#### December 17, 2007



Figure 5.14: Theoretical and practical stability limits.

tion. The phase plot represents a single shift of 90 degrees which means the loop behaves as if there is only a single pole, therefore it is unconditionally stable.

As much as stability is, jitter performance of the loop is also vital. Even though all the components within the loop introduces some jitter to the final result, the two dominant contributors of PLL output jitter are the reference and the VCO jitters which pass through the control loop via the following transfer functions, respectively as:

$$T_{ref2out}(s) = \frac{2\xi\omega_n Ns + N\omega_n^2}{s^2 + 2\xi\omega_n s + \omega_n^2}$$
(5.14)

$$T_{vco2out}(s) = \frac{s^2}{s^2 + Ks + \frac{K_o I_p}{2\pi NC}}$$
(5.15)

Eq. 5.14 has a low-pass nature whereas Eq. 5.15 represents a high-pass character. PLL low-pass filters the reference jitter while doing the opposite for VCO jitter. This introduces a trade-off between loop speed and output jitter: if the VCO is clean whereas the reference is not, then configuring the loop to be slow enough, that December 17, 2007



I[Y/U](jw)Ι, u=ClkLHC, y=ClkPLL/N

Figure 5.15: Bode plots for continuous-time approximated CP-PLL transfer function for a specific parameter set.

is, choosing a lower  $\omega_n$  is desirable and if VCO is not clean and the reference is, then having a faster loop response, that is, choosing a higher  $\omega_n$  is desirable. In other words, a PLL can filter either reference or VCO jitter or a trade-off results in an *optimum* value for  $\omega_n$ . Fig. 5.16 and Fig. 5.17 shows the plots of both the transfer functions of Eq. 5.14 and Eq. 5.15 for a specific parameter set, respectively.

### Numerical Parametrization

Having the CP-PLL behavioral and jitter models, together with the analytic stability limit expressions, the next step is choosing numerical values for proper operation. For this purpose a C/C++/Octave application namely CaPPeLLo is developed<sup>9</sup> for fast evaluation of the loop behavior. Appendix B provides algorithm cores used.

The transceiver for which the CP-PLL based serializer presented in this thesis

<sup>&</sup>lt;sup>9</sup>CaPPeLLo stands for Charge-Pump Phase Locked Loop parametrizer.



Figure 5.16: Jitter transfer function from reference to PLL output.



Figure 5.17: Jitter transfer function from VCO to PLL output.

is designed will be used in different<sup>10</sup> conditions<sup>11</sup>. Thus the CP-PLL is designed to be programmable all-on-chip:  $I_p$ , the charge-pump current from  $1\mu A$  to  $20\mu A$  with a step of  $1\mu A$  and  $C_1^{12}$ , the loop filter capacitance from 15pF to 450pF with a step of 15pF. The loop resistor is also programmable to place the zero accordingly and adjust the proportional term on the control voltage driving the VCO.

Fig. 5.18 shows the stability plot, a subset of possible operating points for different damping factors,  $\xi$ , and the settable extremes available via adjusting the  $I_p$ and  $C_1$ . For all the parameter sets, only 4 points are chosen for simulation where the natural frequency,  $\omega_n$ , is set to 500kHz, 1MHz, 1.5MHz and 2MHz whereas chargepump current,  $I_p$ , is set to 5 $\mu$ A, 10 $\mu$ A, 15 $\mu$ A and 20 $\mu$ A respectively. Loop resistor is also adjusted to keep  $\omega_n$  and  $\xi$  at their nominal values. As seen from Fig. 5.18, not all the settable operating points are stable. This would allow users to explore possible operational ranges and to better define practical limits associated with the environmental conditions.

For each parametrization, a set of plots showing the adjustability limits are generated. Fig. 5.19 shows such a set as a function of parameter set index. It shows  $\omega_n$ , the natural frequency in rad/s,  $\tau$ , the time constant of loop filter in s,  $R_{MAX}$ , maximum value for loop resistor in Ohms, R, actual loop resistor value that must be chosen in Ohms, C1 and C2, the filter capacitances in F and pF respectively, proportional and integral terms in mV, and BL, the resulting noise bandwidth of the loop in Mrad/s. The operating points in Fig. 5.18 are results of parametrizations similar to the one seen in Fig. 5.19.

The model and the numerical parametrizations presented in this subsection are calculated analytically, therefore they must be verified by HDL<sup>13</sup> simulations. Especially the jitter transfer functions should be calculated numerically by investigating HDL simulation outputs. VerilogA simulation results are presented in next section.

<sup>&</sup>lt;sup>10</sup>Different radiation levels depending on where the electronics reside at the experimental pit or no radiation at all in the counting room.

<sup>&</sup>lt;sup>11</sup>Different environmental conditions like temperature gradients depending on the link location.  ${}^{12}C_2 = C_1/15$ . is fixed

<sup>&</sup>lt;sup>13</sup>HDL stands for Hardware Description Language



Figure 5.18: A subset of selectable operating points.

114



Figure 5.19: A detailed parameter set.

# 5.6.3 Model Based Simulation Results

Even though having a programmable device is desirable as it gives users the chance to chose the best operational condition, it also makes it impossible to simulate all possible parameter sets during the design phase. Additionally considering the number of process and mismatch corners introduced by the technology used, only a small fraction of possible operating points can be fully simulated in behavioral level. 15 process corners are used: for 5 different technology dependent device parameters and for  $\pm 10\%$  of  $V_{dd}$  and temperatures of 125, 25 and -20  $C^o$ . The operating points where the damping factor is equal to 4.67 and 1.0 as seen in Fig. 5.18 are fully simulated. The other operating points on the same plot are simulated partly. iVerilog, VerilogXL and VerilogA simulation results are presented in this subsection. Appendix B provides the actual verilog model cores used for obtaining the results given in this subsection.

### December 17, 2007

#### Test-bench

Fig. 5.20 shows the verilog model used. 40 MHz LHC reference clock is generated via dividing a 4.8GHz clock by 120. Not generating 40MHz directly enables for a comparison between the PLL output and fast clock by jitter probe *high*. For each simulation, one divider is devised by using the path selector, S1. The jittered divider (%120) can introduce white and sinusoidal jitter such that the jitter performance of the loop can be evaluated numerically via *probe* A. The loop filter consists of R,  $C_1$ and  $C_2$ ;  $R_s$  is added to imitate CP leakage or *shunt loading* of the filter impedance which results in a *static phase error* or *loop stress*. Two VCOs, one ideal and one jittered, are used for a comparison via *probe* B. For each simulation, one VCO is devised by using the path selector, S2. All the components except the VCO are ideal<sup>14</sup>. The VCO is first implemented in device level, control curve and duty cycle error parameters are extracted and included in verilog simulations.

Mainly two types of probes are used for further analysis: the ones measuring the difference in transition times (i.e. jitter probes, probe A and B) and the ones measuring instantaneous periods (i.e. XXX\_c2c\_period where XXX is PLL, REF and LOC.). The jitter probes show the locking process and the static phase errors whereas the period meters convey information relating to the cycle-to-cycle jitter. Appendix B gives practical definitions of these jitter metrics together with the basic algorithms used to calculate them.

Fig. 5.21 shows the VCO control voltage, outputs of jitter probes low and high at the beginning of the locking process simulated within verilogA environment. First, PFD decides to which direction should the  $V_{ctrl}$  be adjusted via pumping the CP current,  $I_p$ . This adjusts the VCO frequency, then gradually the error at the inputs of the PFD drops to a certain value.  $V_{ctrl}$  and error signal at the output of the PFD almost settles down, then the phase error between the 4.8 GHz clock and the VCO clock begins to drop until it reaches to the static phase error introduced by the leakage current flowing through the parasitic shunt loading resistor,  $R_s$ . The vertical line at 4  $\mu s$  shows the time instance after which the jitter metrics are calculated,

<sup>&</sup>lt;sup>14</sup>Behavioral parameters are not extracted from device level simulations



Figure 5.20: Test-bench used for verilog simulations.

December 17, 2007



Figure 5.21: Locking process of a CP-PLL showing the control voltage of the VCO (top), the phase error at the inputs of the PFD (middle), the phase error at the output of the PLL (bottom), and the simulation time instance (vertical line at  $4\mu s$ ) where the jitter statistics start to be collected.

since the jitter performance relates to the locked condition.

It should be noted that *Vctl* signal in Fig. 5.21 has two dominant components in time domain: the integral component which is responsible for the averaged behavior and the proportional component which is responsible for the rippling behavior.

*Vctrl* actually never settles down but the amplitude of the proportional component drops down to a certain value which is one of the important aspects for the jitter performance of the loop. Fig. 5.22 shows a similar simulation result performed within iVerilog [9] and verilogXL. Considering the amount of time required for different simulators, different commercial and/or open-source simulators are used according to what is being measured. Fig. 5.22 shows the following signals:

- 1. 233 ps, ideal PLL output clock period (black curve, denoted as F)
- 2. jittered PLL output clock period (red curve, denoted as A, B and C)



Figure 5.22: Locking process of a CP-PLL in case of significant reference and VCO white noise (See text for details).

- 3. ideal duty cycle error (yellow curve, denoted as G)
- 4. 10ps, PLL output duty cycle error (blue curve, denoted as H)
- phase error of PLL output with respect to ideal 4.8 GHz clock generator sitting just before the jittered divider in 40MHz LHC clock generator of Fig. 5.20 (green curve, denoted as D and E)

where at the beginning, the loop is out of lock, thus the phase error is randomly changing in the region denoted as D. Until the moment denoted as A, frequency acquisition occurs: PLL output period plotted in red converges to the level it is expected. Just after the time instance denoted as A, phase error plotted in green converges to zero with a significant amount of white noise. Almost immediately after the time instance A, the loop is phase/frequency locked to the reference. At the time instance B, the reference frequency drops down to a certain value with a sharp step. Loop looses the locked condition, thus the region denoted as E. Until the **December 17, 2007** 

time instance C, loop acquires the frequency lock for the new reference frequency. Then the same process repeats for the time instance denoted as C. Similarly almost immediately after the time instance C, the loop is said to be phase/frequency relocked to the reference.

### Simulation Results

To verify and cross-check what is expected from analytical parametrizations against how the verilog model behaves, simulations with different parameter sets are performed.

As an example, choosing different damping ratios by adjusting loop parameters results in different locking behaviors and jitter performances. Fig. 5.23 and Fig. 5.24 show verilog and verilogA simulation results of a PLL configuration for different values of  $\xi$  and  $\omega_n$  for specific parameter sets. Under-damped (bottom) and critically-damped (top) behavior filtering a 200ps peak-to-peak jitter of reference (black curve) is perfectly coherent with the analytical expectations as seen in Fig. 5.23. Reference period jitter suppression is also clearly visible as the peak-to-peak variation at the output of the feed-back divider (red curve) is very small compared to that of the reference. Another example is to consider  $\omega_n$ , the natural frequency of the loop; filtering of reference jitter should get worse as  $\omega_n$  increases<sup>15</sup>. Fig. 5.24 shows verilog simulation results for three values of  $\omega_n$  for a specific parameter set. Simulations are performed for exactly the same reference with exactly the same numerical values introduced as white jitter in order to make a better comparison. As expected from the analytical model, variation of instantaneous periods at the output of the feed-back divider (red curve) gets larger as  $\omega_n$  increases.

An important issue is the so called *jitter peaking* in the transfer function, occurring at the natural frequency of the loop as seen in Fig. 5.25. Jitter peaking should be minimized so that a jitter component in the reference clock at a frequency equal to the natural frequency of the loop is not amplified at the output of the PLL. Jitter

<sup>&</sup>lt;sup>15</sup>This also makes VCO jitter suppression get better.



Figure 5.23: Instantaneous periods of the clock signals at the inputs of the PFD for  $\omega_n = 100kHz$  and  $\xi = 1.0$  (top),  $\omega_n = 100kHz$  and  $\xi = 0.3$  (bottom).

121

December 17, 2007



Figure 5.24: Instantaneous periods of the clock signals at the inputs of the PFD for  $\xi = 4.67$  and  $\omega_n = 0.5$ MHz (top),  $\xi = 4.67$  and  $\omega_n = 1$ MHz (middle),  $\xi = 4.67$  and  $\omega_n = 1.5$ MHz (bottom).

December 17, 2007



Figure 5.25: Jitter peaking.

peaking (JP) is given in decibels by [93]

$$20log(JP) = \frac{2.172}{\xi^2} \tag{5.16}$$

which is a function only of  $\xi$ . As an example, according to SONET specifications JP should be less then 0.1dB, thus a minimum damping ratio of at least 4.66 is needed. To achieve numerical verification of the jitter transfer function and especially investigating the amount of jitter peaking in behavioral level, sinusoidal reference jitter is introduced and the output of the PLL is observed. Fig. 5.26 shows PLL jitter performance in the presence of sinusoidal reference jitter with a frequency equal to the natural frequency of the loop for a specific parameter set. Introduced reference jitter is 80ps peak-to-peak and the frequency chosen is 1MHz which is also the  $\omega_n$ of the loop. Therefore JP, if it exists, should manifest itself at this frequency of jitter. However, for the specific simulation of which the result is given in Fig. 5.26, no jitter peaking is observed; PLL output has a peak-to-peak period jitter of 150fs. Similar simulations are repeated for different  $\omega_n$  of the loop such that jitter transfer function can be verified. For numerical verification, at least three points should be simulated:  $f_{Jr}$  being the sinusoidal reference jitter frequency,  $f_{Jr} = \omega_n/10$ ,  $f_{Jr} = \omega_n$ , and  $f_{Jr} = 10\omega_n$ . Cycle-to-cycle jitter metrics are calculated and numerical values are obtained. Table 5.1 shows numerical results for an over-damped loop where  $T_{ref2out}$  and  $T_{ref2local}$  are jitter transfer functions from reference to PLL output and to feedback divider output respectively. They are redefined as

$$T_{ref2out} = \frac{J_{OUT}}{J_{REF}} \tag{5.17}$$

$$T_{ref2local} = \frac{J_{LOCAL}}{J_{REF}} \tag{5.18}$$

where  $J_{REF}$ ,  $J_{OUT}$  and  $J_{LOCAL}$  are the cycle-to-cycle jitter of reference, PLL output and feedback divider output, respectively. Cycle-to-cycle jitter is defined as the variance in instantaneous period values and thus, is an r.m.s. metric. In this simulation, no VCO jitter is introduced, since the phenomenon of jitter peaking occurs only for reference jitter.

When the reference jitter frequency is below the natural frequency of the loop,  $T_{ref2local}$  yields a number which is used as the reference value for the other results as it is supposed to be on the flat region of the jitter transfer function. As the  $\omega_n$  increases to 1.0MHz, which is also the natural frequency of the loop, we observe an increase in transfer function which is still below 1, confirming that there is no jitter peaking. When the reference jitter frequency is increased further to 10MHz, transfer function yields a very small value compared to the others as expected. Table 5.1 confirms that there are no jitter peaking, even though JP=0.1dB is the target in parametrization phase.

After verifying jitter transfer functions by introducing sinusoidal reference jitter and investigating the effects of different parameter sets for the actual circuit, white jitter must be added to the reference and the VCO in order to find the optimum  $\omega_n$ . Since we use a differential self-biased 3-stage ring oscillator with a level-shifting differential-to-single-ended converter at the output, we expect to have relatively a high VCO jitter compared to the reference clock, which is generated by a relatively precise crystal oscillator. To filter the VCO jitter out, the circuit should not be a very narrow-band loop. Fig. 5.27 shows an example simulation result in which the output cycle-to-cycle jitter is 160fs as a result of white reference jitter with a

Table 5.1: Jitter transfer function verification result with sinusoidal reference jitter  $(\omega_n = 1MHz).$ 

| $T_{ref2out}$ | $T_{ref2local}$ | Jitter Frequency    |
|---------------|-----------------|---------------------|
| 0.0043        | 0.433           | $0.1 \mathrm{~MHz}$ |
| 0.0067        | 0.84            | 1.0 MHz             |
| 0.0003        | 0.04            | 10.0 MHz            |



Figure 5.26: Introduced sinusoidal jitter (left-top), its histogram form (left-bottom), observed PLL output (right-top), and its histogram form (right-bottom).


Figure 5.27: Introduced white jitter (left-top), its histogram form (left-bottom), observed PLL output (right-top), and its histogram form (right-bottom).

peak-to-peak value of 100ps. Similar simulations are performed for different natural frequencies of  $\omega_n = 0.5MHz$ ,  $\omega_n = 1MHz$ , and  $\omega_n = 1.5MHz$  as Table 5.2 shows numerical results, where the ability to filter the reference jitter degrades.

| $T_{ref2out}$ | $T_{ref2local}$ | $\omega_n$          |
|---------------|-----------------|---------------------|
| 0.00025       | 0.049           | $0.5 \mathrm{~MHz}$ |
| 0.0013        | 0.207           | 1.0 MHz             |
| 0.0034        | 0.53            | $1.5 \mathrm{~MHz}$ |

Table 5.2: Jitter transfer function verification result for white reference jitter.

### 5.6.4 Transistor Level Implementation

This section summarizes the device level implementation of each building block within the CP-PLL and provides the explicit schematics.

#### **Differential VCO**

Single-ended full-swing approach results in a better phase noise performance. Their jitter characteristics are also better, since the jitter is proportional to the duration the gain transistors are on. Full-swing minimizes the on-time of the gain transistors. However, these comments are correct only if the operational environment is silent enough which is not the case for nuclear and HEP experiments. Due to their lower sensitivity to substrate and supply noise as well as lower noise injection into other circuits on the same chip [90], differential signaling approach is chosen.

Fig. 5.28 shows the implemented ring-type Voltage Controlled Oscillator (VCO) adopted with self-biasing [36] technique. It consists of three differential delay cells (D) together with their biasing circuitry to minimize process dependencies. At the output stage, however, differential low swing signal is converted (by D2S) to a single-ended full-swing signal to be used within the CP-PLL.

Fig. 5.29 shows the delay cell and the biasing circuit used for both the delay cell and differential-to-single end converter. Design is adopted from [36]. The signal bnis the control voltage of the VCO which is generating the signal bp used by the load December 17, 2007



Figure 5.28: Top level view of self-biased 3-stage differential ring-type VCO with its self-biased single-end converter.

transistors of the differential pair to establish process independence. Transistor sizes and number of fingers are  $12\mu/0.12\mu$  and 3 for pMOS devices whereas  $34\mu/0.12\mu$ and 17 for gain transistors. Tail transistor has a size of  $64\mu/0.12\mu$  with a single finger.

The important issues in RF VCO design are the accuracy of the device models and the control curve of the VCO. We had two transistor models available in the target technology for the VCO design: normal and RF. Since transistors with RF model parameters give pessimistic results in terms of oscillation frequency, they are used in schematic level simulations to better predict final result. However, normal transistors are used in the layout. This choice can be justified based on the validity regions of the device models which are constructed according to the operation frequencies [6]. In case *normal* and RF models are interpreted as *low* 



Figure 5.29: Delay cell for self-biased 3-stage ring-type VCO (left) and the biasing circuit (right).



Figure 5.30: Implemented differential-to-single-end converter.

and *high* frequency models, then the question of at which frequency they diverge is inevitable. A measurement result confirming such a divergence is given in the next chapter based on a simple single-ended GVCO design.

Fig. 5.30 shows the implemented differential-to-single end converter with self biasing technique using the same bn control voltage. The transistors in the biasing circuit and the output differential-to-single-end converter have the same corresponding sizes and number of fingers. It should be noted that the last stage in differential-tosingle-end converter introduces the duty cycle error. Therefore, the serializer design is modified accordingly such that only rising edges of the VCO output are used, thus the duty cycle error introduced does not have any effect on the serializer operation.

However, by sacrificing half the energy of the signal at the output of differential ring oscillator, an inherently 50% of duty cycle can be obtained [11]. Fig. 5.31 shows an alternative differential-to-single end converter implementation. It consists of an input differential pair and an output inverter together with their replicas to generate reference signals ensuring proper operation and process independence. At the first stage, signal is accepted and half of the energy is used to drive the inverter. The tail current of input differential pair is adjusted by an opamp such that its output common mode is kept equal to the transition point ( $V_{TH}$ ) of the output inverter. This is guaranteed by the short-cut replica inverter generating the reference signal for the opamp. Opamp tries to maintain the un-used output of replica differential pair at the same voltage level as the replica inverter output by adjusting the tail **December 17, 2007** 



Figure 5.31: Inherently 50% duty cycle differential-to-single-end converter.

current. Inputs of replica differential pair is maintained at the common mode (CM) level of the differential ring oscillator. This is done by either replicating a single delay stage and short-cutting both of the outputs or using 3 taps in the VCO and reserving one of them as the CM reference generator. Since the same tail current is used at the input differential pair, output inverter generates an inherently 50% duty cycle signal. Sizes of the inverters are equal and they are laid down closely.

Drawback of the implementation shown in Fig. 5.31 is the wasted input signal energy. It manifests itself as a lower slew rate at the output inverter. Fig. 5.32 shows output signals (left) at a certain frequency and the derivatives (right) of both. Since the implementation shown in Fig. 5.31 wastes half the input signal energy, its output signal has a two times smaller slew rate compared to that of Fig. 5.30. Even though the slew rate can be increased by forcing a bigger current at the output inverter, it decreases the robustness of the circuit. At higher frequencies, this approach results in a non switching inverter which is catastrophic whereas the implementation shown in Fig. 5.30 has always an oscillating output, even though the output can loose the nominal voltage levels and generates a non-50% duty cycle signal. Considering the above problems occurring at high frequencies and issues like stability of control voltage that adjusts the current levels and mismatch between the original stages and their replicas, the alternative approach is not used within the production system.

Considering a PLL, VCO is the bottle neck in terms of operating frequency and jitter performance as it is the block running at full speed. In our design, VCO is the

| Index | Corner | T $[C^o]$ | Vdd [V] | Abbreviation |
|-------|--------|-----------|---------|--------------|
| 0     | 1      | 125       | 1.08    | 1_125_108    |
| 1     | 1      | 25        | 1.2     | 1_25_120     |
| 2     | 1      | -20       | 1.32    | 120_132      |
| 3     | 2      | 125       | 1.08    | 2_125_108    |
| 4     | 2      | 25        | 1.2     | 2_25_120     |
| 5     | 2      | -20       | 1.32    | 220_132      |
| 6     | 3      | 125       | 1.08    | 3_125_108    |
| 7     | 3      | 25        | 1.2     | 3_25_120     |
| 8     | 3      | -20       | 1.32    | 320_132      |
| 9     | 4      | 125       | 1.08    | 4_125_108    |
| 10    | 4      | 25        | 1.2     | 4_25_120     |
| 11    | 4      | -20       | 1.32    | 420_132      |
| 12    | 5      | 125       | 1.08    | 5_125_108    |
| 13    | 5      | 25        | 1.2     | 5_25_120     |
| 14    | 5      | -20       | 1.32    | 520_132      |

Table 5.3: Technology process corners used for parameter extraction.

first block which is designed in device level and its parameters<sup>16</sup> are extracted for 15 corners as listed in Table 5.3 from device level simulations to be used within verilog environment before full behavioral PLL simulations. That is, in all the behavioral simulations the VCO gain,  $K_o$  is a variable. Fig. 5.33 shows VCO control curves for different process corners and Fig. 5.34 represents the derivatives of the fit functions of these curves as the VCO gain. Worst case maximum operating frequency of the VCO is approximately 5.7 GHz (upper dashed line in Fig. 5.33) which provides enough margin considering the nominal operating frequency of 4.8 GHz (dotted line in the plot). Cross sections of the dotted line and two circled curves represent the minimum and maximum control voltages at locked state.

As the technologies scale down and VCO operating frequencies increase, rela-

<sup>&</sup>lt;sup>16</sup>This includes control curve and duty cycle error parameters for different process corners.



Figure 5.32: Output signals (left, red curve belongs to the alternative d2s) of both d2s implementations and their derivatives (right, smaller height blue curve belongs to the alternative d2s).

tively a high VCO gain is inevitable. This is not problematic considering basic PLL dynamics, since the feedback divide ratio of 120 is also relatively large. High VCO gain can result in poor jitter dynamics but keeping CP current low enough and integrating as much filter capacitance as possible can be the remedy. Therefore, techniques like control curve segmentation are not applied since they require calibration and can cause disturbance in a radioactive environment which can not be accessed physically during the operation.

Power consumption and jitter trades-off in ring oscillators. Also considering SEU and SEL<sup>17</sup> in a radioactive environment, the current consumption is kept relatively high, since a possible ionizing particle should generate relatively a very small amount of charge compared to the amount of charge that moves during nominal VCO operation. Fig. 5.35 shows total power consumption as a function of control voltage. Duty-cycle error as a function of process corner is given in Fig. 5.36.

The duty-cycle error introduced by the differential-to-single-end converter is inevitable and as seen in Fig. 5.36, it varies with the corner parameter set. This

<sup>&</sup>lt;sup>17</sup>SEU and SEL stand for Single Event Upset and Latch-up, respectively.



Figure 5.33: VCO control curves for 15 corners.



Figure 5.34: VCO gain curves for 15 corners with lines drawn at maximum.



Figure 5.35: VCO total power consumption for 15 different process corner index.



Figure 5.36: VCO output duty cycle error as a function of corner index.

results in a systematic jitter at the output of the serializer. The actual 4:1 MUX blocks are replaced by the switches imitating the 4:1 MUX functionality to remove the correlation between the VCO duty-cycle error and the output jitter. The proper functioning of the resulting serializer design depends only on the rising edges of the clocks involved.

#### Feed-Back Divider

The feedback divider within CP-PLL (Fig. 5.13) is also used to generate the clock signals for the serializer (Fig. 5.12), namely the *Load*,  $f_{bit}/4$  and Qi signals. It consists of two different dividers: divide-by-four (%4) and divide-by-thirty (%30), thus a total division ratio of 120.

Fig. 5.37 shows the %4 part. It consists of four D-FFs, an SR-FF and a 3inputs NOR gate closing the feedback loop. It is important for the serializer to have precisely equally spaced rising edges controlling the switches in Fig. 5.12. The %4 block shown in Fig. 5.37 guarantees that the Qi signals are equally spaced; note December 17, 2007



Figure 5.37: Divide-by-four part of the feedback divider and its timing diagram.

that the serializer does not use falling edges of any clock signal to avoid duty-cycle related problems introduced by differential-to-single-end converter placed after the VCO.

The divide-by-30 block of the feed-back divider is basically a counter with tripleredundancy scheme. At every 30 count, it generates the *Load* signal, so once per 120 VCO cycles, the 120-bits word is loaded into the four 30-bits registers to be serialized.

#### Phase/Frequency Detector

Fig. 5.38 shows the implemented PFD consisting of two edge detectors (ED) and a NOR gate for resetting. D inputs of the edge detectors are always kept at logic 1. As the phase difference between the inputs of the PFD decreases, the up or down signal widths decrease accordingly. Fig. 5.39 shows the edge detector implementation where all the nMOS and pMOS devices have sizes to achieve equal current driving capability. The design is adopted from [109].



Figure 5.38: Implemented phase/frequency detector.

137



Figure 5.39: Implemented edge detector.

#### Charge Pump and Loop Filter

Programmable charge pump is shown in Fig. 5.40. It is a conventional binary voltage to current converter consisting of pass-gates which are controlled by PFD output signals, u and d, a D/A<sup>18</sup>, and a basic voltage follower, VF. As the PFD outputs u and d, pass-gates switch the circuit from one configuration to the other, and the current mirrored by the transistors (T1-T5) is pumped into/from the output load accordingly. VF keeps its input and output at the same voltage level preventing the so-called *charge sharing* phenomenon to occur. The amount of current is determined by the D/A with steps of  $1\mu A$ . The adjustable resistor formed by R-network, and capacitors, C1 and C2 formed by C and C' networks, can have values which are integer multiples of 2.5K, 15pF, and 1pF, respectively.

 $<sup>^{18}\</sup>mathrm{D/A}$  stands for digital-to-analog converter.



Figure 5.40: Implemented charge pump and low-pass filter.

# Chapter 6

# **Burst-Mode CDR**

As a possible functional extension in GBT, packet- or burst-capable clock and data recovery design and implementation details will be presented in this section.

# 6.1 Introduction

Clock and data recovery (CDR) is a critical function in high-speed transceivers. Such transceivers serve in many applications, including optical communications, back-plane routing, and chip-to-chip interconnects. The data received in these systems are both asynchronous and noisy, requiring that a clock be extracted to allow synchronous operations. Furthermore, the data must be re-timed such that the jitter accumulated during transmission is removed. CDR circuits must satisfy stringent specifications defined by communication standards, posing difficult challenges to system and circuit designers.

In order to perform synchronous operations such as re-timing and de-multiplexing on random data, high-speed receivers must generate a clock. As illustrated in Fig.



Figure 6.1: Clock and data recovery operation.

6.1, a clock recovery circuit senses the data and produces a periodic clock. A D-type flip-flop (D-FF) driven by the clock then re-times the data (i.e., it samples the noisy data), yielding an output with less jitter. As such, the flip-flop is sometimes called a *decision circuit*. The clock generated in the circuit of Fig. 6.1 must satisfy three important conditions:

- It must have a frequency equal to the data rate; for example, a data rate of 10 Gb/s (each bit 100 ps wide) translates to a clock frequency of 10 GHz (with a period of 100 ps).
- 2. It must bear a certain phase relationship with respect to data, allowing optimum sampling of the bits by the clock; if the rising edges of the clock coincide with the midpoint of each bit, the sampling occurs farthest from the preceding and the following data transitions, providing maximum margin for jitter and other timing uncertainties.
- 3. It must exhibit a small jitter since it is the principal contributor to the re-timed data jitter.

CDRs usually use PLLs to control the decision circuits, thus the dynamics of a PLL based CDR is determined by the dynamics of the PLL itself. PLLs are relatively slow feed-back systems suitable for continuous communication. A PLL may need a few  $\mu s$  to lock; this is acceptable in continuous transmission because adding a sufficiently long header (preamble) could solve the problem of locking, since this would be done only once at the beginning of the communication.

## 6.1.1 Burst-Mode Network

Suppose the communication is not continuous- but packet- or burst-mode, i.e. the receiver is not supposed to accept continuous data from a single transmitter but to accept data in the form of packets from different transmitters as seen in Fig. 6.2. The Tx blocks are not synchronous and their data rates, logic levels and protocols can be different. The burst-mode receiver is supposed to extract frequency, phase, logic level and protocol out of the data package that it processes within a very short **December 17, 2007** 



Figure 6.2: Burst-mode network.

period. The aim is the *acceptance* of the data which is usually not a long string, thus which can not have sufficiently long header for a PLL based CDR to lock. In case a PLL based CDR is used, considering a few Gbit/s transmission rates, an unacceptable amount of data would be lost in the transmission before the PLL locks to the incoming data.

Burst-mode CDRs come into play in this scene. They are designed to lock to the incoming data in only a couple of transitions [37] within the header or preamble which precedes each packet. The receiver "learns" what type of a data packet it is processing within the header and then using this knowledge, it parses the data, a process known as *data acceptance*.

### 6.1.2 CDR Classification

Considering the international literature, there is not a strict classification convention for CDRs. However, it would be useful to have one such that the number of architectural choices becomes evident taking their pros and cons into account. Behaviorally the CDR architectures can be classified as

- 1. continuous versus burst mode [97], depending on the mode of communication
- 2. closed loop versus open loop
- 3. filter-based versus over-sampling
- 4. clock delay versus data delay [94]

5. digital versus analog

and depending on different system requirements, CDRs can be based on

- 1. PLL Phase Locked Loop
- 2. DLL Delay Locked Loop
- 3. Semi and/or blind over sampling [63] [64] [98]
- 4. FSM Finite State Machine [108]
- 5. GVCO Gated Voltage Controlled Oscillator [67]
- 6. Hybrid

leading to a large number of different possibilities taking the combinations they can form into account. Because of space limitations, listing all those possible architectures is not possible. However, several representative examples from the internationally published literature will be briefly mentioned in the following subsections.

#### Generic Continuous-Mode Closed-Loop PLL-Based CDR

Fig. 6.3 shows a dual-loop PLL architecture which would sit inside the *CDR* block of Fig. 6.1 forming a continuous-mode closed-loop CDR. The operation principle of a PLL is extensively represented in the previous chapter and in Appendix A. The two loops, namely *coarse* and *fine*, lock the local VCO to the incoming data first in terms of frequency and then in terms of phase, respectively. The block in Fig. 6.3 recovers the clock where, as seen in Fig. 6.1, D-FF recovers the data by clocking it with this recovered clock.

It is a *closed-loop* architecture because of the fact that it has a feed-back or in other words it is a control loop with a *controller* and a *plant* being controlled via a feed-back path. It is *continuous-mode* because the locking process is rather slow requiring relatively a long time resulting the fact that it can not be used to instantaneously lock to the incoming data. This, in turn, requires that the communication should be maintained in a continuous manner in order not to have to lock again and again.



Figure 6.3: A PLL-based CDR architecture with *fine* and *coarse* adjustments.

In such an architecture, there are issues concerning the co-operation of the two different *controllers* trying to adjust the behavior of the same *plant*. In case the loop parameters are not chosen properly, they can conflict leading to operation failure [91].

#### Burst-Mode Delay-the-Data Based CDR

Fig. 6.4 depicts a CDR architecture capable of handling packet- or burst-mode communication. In this architecture the incoming data are introduced a *variable delay* to coincide them with the clock, CK, such that the properly delayed  $D_{in}$  can be clocked by the *Data Re-time* which is basically a D-FF. Phase detector compares the, e.g. rising edges, of both CK and  $D_{in}$  for the *loop filter* to be able to generate the control signal adjusting the amount of delay in *variable delay* block. Edge detector adjusts the time constant of the loop filter such that when there are no transition in the incoming data,  $D_{in}$ , the delay remains the same.

The architecture is *burst-mode* because locking is very fast, thus suitable for short *words* of incoming data whereas a continuous stream can not be handled properly. This architecture is limited by the amount of delay which can be introduced to the incoming data [94]. The maximum delay that can be introduced is usually a few clock cycles of the frequency with which the incoming data are formed. In a continuous-mode communication, however the timing error can integrate indefinitely



Figure 6.4: A burst-mode CDR architecture based on delaying the incoming data.

and the delay block can not compensate such an error continuously. Therefore, it is suitable only for burst-mode transfer.

#### Burst-Mode Open-Loop Gated-VCO Based CDR

Fig. 6.5 represents an open-loop packet-capable CDR architecture where the incoming data stream is clocked by a local oscillator, Gated-VCO. At every transition of the digital input, the phase of the local oscillator is reset via simply being restarted. *Gating circuit* generates the restarting or gating signal which in turn resets the local oscillator's phase. D-FF parses the incoming data with the clock generated by the gated-VCO.

The architecture is quite responsive as it can instantaneously lock the local oscillator to the incoming data. The issue is that if the frequency with which the incoming data has been formed is different enough from that of the local oscillator, then the D-FF would generate wrong output. This architecture must employ an



Figure 6.5: A burst-mode CDR architecture based on gated VCO.

ordinary PLL to keep this difference at a reasonable level (it is open-loop). As an example, considering 8b-10b line encoding [95], the maximum number of consecutive identical bits (CIB) would be 5. This leads to a frequency difference tolerance of, at maximum, 20%. That is, as a worst case scenario, if the difference between the two frequencies is less then 20%, communication is expected to be successful.

One of the key issues of such an architecture is that the length of gating signal must be as constant as possible [37] against process and mismatch corners of the technology. It is usually chosen to be the half of the incoming data bit length to coincide, e.g. the rising edge of, gated-VCO with the incoming data, thus the D-FF can parse properly.

#### Continuous-Mode Closed-Loop Blind-Oversampling Based CDR

Fig. 6.6 shows a blind-oversampling based CDR suitable for continuous communications [64]. Serial data stream is blindly parsed or clocked by a multi-phase local oscillator and the slices are stored in sample storage. Bit boundaries are detected in the relevant block and the slice which is equally far from the preceding and the next transition instances is taken as the accepted incoming data bit by data reduction block. To define the bit boundaries, an algorithm is executed which requires some time. After the completion of the algorithm execution, the best slice is chosen to be the sampled input.

Contrary to the general misconception, blind-oversampling architectures can pro-



Figure 6.6: A continuous-mode CDR architecture based on blind oversampling.

vide results comparable<sup>1</sup> to conventional CDRs as discussed in detail in [64]. One of the issues involved is the minimization of the execution time of the bit boundary detection algorithm.

#### Continuous-Mode Closed-Loop Semi-Blind Oversampling Based CDR

A blind-oversampling CDR tracks the high-frequency jitter of the input data stream, but is limited at low-frequencies by the size of its FIFO [64]. A phase-tracking CDR, on the other hand, tracks jitter at frequencies below  $\omega_{-3dB}$  of its loop filter, but performs poorly beyond this frequency [65]. Fig. 6.7 depicts a semi-blindoversampling based architecture [63] which produces a jitter tolerance equal to the product of the jitter tolerances of a phase-tracking CDR and a blind-oversampling CDR. Therefore it increases the low-frequency jitter tolerance by a factor of 32 (limited by the FIFO size).

This is also a good example of how a *hybrid* architecture can be formed via combining two different approaches to benefit from the attractive properties of both, namely the high-frequency jitter-tracker blind over-sampling and the low-frequency jitter-tracker filter-based architectures.



feed-back low-frequency jitter tracking



<sup>&</sup>lt;sup>1</sup>Bit error rates of the order of  $10^{-12}$ 

#### Burst-Mode Finite State Machine Based CDR

Fig. 6.8 shows a CDR architecture using a finite state machine (FSM) with 2:1 MUX functionality [108] and the state diagram. Each arrow in the state diagram corresponds to a state transition in FSM. The binary value on the arrow represents the current value of the input. For the 1:2 de-multiplexer, the FSM has a total of eight states. The FSM stays in each state for a period equal to the length of the input data. Then it transitions to the next state, based on the input bit. The prime superscript in the state name is equivalent to the select line of a conventional demultiplexer. States with prime superscript correspond to the ones for which input bit effects *out*<sub>2</sub>. The first subscript in the state name is the current input bit and the second subscript is the previous input bit stored to hold the un-effected output.

As reported in [108], the architecture can perform 1:n de-multiplexing without additional clock recovery phase-locked loop or sampling blocks. The FSM is formed with combinational logic and analog LC transmission line delay cells in a feedback loop. The FSM responds to input data transitions instantaneously and sets the outputs. The system reduces unit interval jitter by a factor of de-mux'ing ratio (n).

The feed-back delay of  $T_b$  is an issue in the architecture. The architecture is actually accompanied with a replica FSM as local ring oscillator [108] forming a delay locked loop (DLL) to keep  $T_b$  at its nominal value, since the timing operation strictly depends on it.



Figure 6.8: A burst-mode CDR architecture (left) based on FSM and its state diagram (right).

148

## 6.2 Architecture

This section deals with the CDR which is designed to be capable of handling both continuous- and burst-mode communications. The architecture is based on the CP-PLL presented in Chapter 5 and an adopted GVCO based blind parsing methodology.

Fig. 6.9 shows the burst-mode CDR architecture adopted from [37]. It consists of a CP-PLL to generate the local 4.8 GHz clock via GVCO2 and more importantly the control voltage,  $V_{ctrl}$  which will be used also for the GVCO1 laid down closely to GVCO2. In addition to a CP-PLL control loop, the architecture has a *blind parser* consisting of an Edge Detector, ED, a local oscillator, GVCO1, and a data type flip-flop, D-FF. Every time the digital input  $(D_{in})$  introduces a transition to ED, a gating signal is generated to restart GVCO1 such that the phase error between the incoming data and the GVCO1 is corrected instantaneously. Incoming data are also introduced to the D-FF to be parsed by the clock generated by GVCO1. GVCO1 and GVCO2 are not locked in any means. They are identical and use the same control voltage such that their oscillation frequencies are close.

The requirement at the extreme case is that the period difference between the two oscillators to be less than 20% thanks to the line coding and error correction schemes used. It is guaranteed by them that the number of CIB is 5 at worst case [95], thus the requirement of 20% frequency difference between the two oscillators. Every time



Figure 6.9: The burst-mode CDR architecture implemented.

a  $D_{in}$  transition occurs, the phase error is reset such that the accumulated phase error in the local VCO is inherently limited to one period, therefore the frequency difference should be minimized as much as possible especially at layout level.

# 6.3 Transistor Level Implementation

Fig. 6.10 shows the test circuit used in spice simulations for proof-of-concept. It employs a *PRBS* generator<sup>2</sup> as the input random data, the edge detector (ED) to generate the *gating signal* starting/stopping the GVCO, a buffered GVCO whose clock signal is used to blindly parse the incoming data by the D-FF and an *equalizing delay* to remove process variations from the timing operation.



Figure 6.10: Simplified circuit of burst-mode CDR for simulation.

The *delay* within ED is set half the incoming bit width so the rising edge of the buffered output should arrive at D-FF nominally in the middle of each bit. It has a severe function of being quite precise and independent from process variations, thus it is implemented as a delay line with equally sized inductors and capacitors as in Fig. 6.11. The delay at low frequencies is

$$T_d = \frac{2L}{R} \tag{6.1}$$

<sup>&</sup>lt;sup>2</sup>PRBS stands for Pseudo Random Bit Sequence.



Figure 6.11: Delay line circuit implemented.

where R is the terminating resistor not shown in the figure. Terminating resistor is set according to impedance matching requirements. To improve the loss and group delay of the delay line at frequencies above 1GHz, the impedance of the delay line which is

$$Z_h = \sqrt{\frac{L}{C}} \tag{6.2}$$

should also be matched with the terminating resistor.

Fig. 6.12 shows the test structure arranged to measure the control curve and the jitter performance of the gated-VCO. It consists of a buffered GVCO followed by a divide-by-64 for enabling to use an ordinary oscilloscope to see the waveform. The divider is a sequence of 6 divide-by-twos where the first two are implemented



Figure 6.12: Schematic of gated VCO test structure.



Figure 6.13: Fast TSPC D-FF.

in dynamic TSPC<sup>3</sup> [66] and the last four are in static-logic. Fig. 6.13 shows the dynamic D-FF implementation. All the pMOS and nMOS transistors are two-fingered and have the same sizes of  $2\mu/0.12\mu$  and  $0.56\mu/0.12\mu$ , respectively.

For the DFF to function properly at 5GHz, there are some issues which must be addressed. Usually flip-flops are characterized by a set of timing constraints defining minimum time intervals during which specific signals must be held steady in order to ensure the correct functioning. Timing constraints include measures like *setup time*, *hold time*, *recovery time*, and *minimum pulse width*. The D-FF shown in Fig. 6.13 can work up to 10 GHz with a reasonable margin.

The GVCO has the simplest possible architecture: it has neither a special biasing circuitry nor any process-and-mismatch minimization scheme. Before the output pad driving buffer at the very end, the output of %64 block is clocked by the original buffered GVCO so that the jitter seen at the output of the test structure is expected to be very close to that of the original GVCO under test. Fig. 6.14 shows the layout of the test structure which is tested successfully.

<sup>&</sup>lt;sup>3</sup>TSPC stands for True Single Phase Clock.



Figure 6.14: Layout of gated VCO  $(10x4\mu m^2)$  test structure.

# 6.4 Simulation Results

Fig. 6.15 shows the waveforms at the input and the output of the delay line. The delay between the two waves vary less than 1.5 ps between extreme corners as seen in Fig. 6.16.

Fig. 6.17 shows the signals involved in a burst-mode acquisition where the transition of the input data generates the gating signal for stopping and starting the GVCO which parses the incoming stream to accept the data. A long CIB is visi-



Figure 6.15: Delay line transient response.



Figure 6.16: Delay variation as a function of process corner where input signal height is 1V and rise/fall times are chosen to be 20ps.

ble at the beginning during which the GVCO jitter is accumulated until the next transition: when the transition occurs, the gating signal is generated and GVCO is restarted in such a way that the phase error between the oscillator and the incoming data is reset.

It should be noted that at each transition of incoming data, gating signal is generated even in a perfectly locked state. So one can expect a slight disturbance or jitter in the clock signal. However in case the frequency difference between the two GVCOs is less than 20%, this should not result in output jitter. Because, provided that the difference is less than a certain value, e.g. 20%, then the final recovered data could be clocked by a system clock, removing timing uncertainty. Once the data are accepted properly re-timing can be done at any level.

The frequency difference between the two oscillators can be estimated by MC simulations. Two GVCOs are simulated with their control voltages equally set and the gating signals are maintained at  $V_{dd}$ .

Fig. 6.18 shows the distributions of frequency difference between the two GV-



Figure 6.17: Burst-mode operation showing the input data stream (bottom), output of the gating circuit (ED), output of the local GVCO (recovered clock) and the recovered data stream (top), respectively.

COs, the output frequencies of the GVCO1 and the GVCO2 for 200 runs. The left-most distribution has a worst-case peak-to-peak frequency difference of approximately 1 GHz while the output frequencies of the GVCOs are approximately 7.2 GHz, corresponding to a frequency difference of less than 14% ((1.0/7.2)  $\cdot$  100).

However as the biasing strength decreases, that is, as the  $V_{ctrl}$  controlling the current levels decreases, variation in frequency difference increases. Additionally, the variation in terms of percentage goes up even faster, since also the operating frequency drops down. Fig. 6.19 shows the same distributions with a lowered  $V_{ctrl}$ . The left-most distribution has a worst-case peak-to-peak frequency difference of approximately 1.25 GHz while the output frequencies of the GVCOs are approximately 5.24 GHz, corresponding to a frequency difference of less than 24% ((1.25/5.24)  $\cdot$  100).

For both of the simulations, the standard deviation is less than 300MHz, however the peak-to-peak distance is considered. Concluding the above simulation results, at worst-case, the CDR can fail as the frequency difference can exceed 20 % peak-December 17, 2007



Figure 6.18: MC simulation results showing the frequency difference distribution between the two GVCOs (left-most) and the output frequency distributions of both the GVCOs (middle and right-most) when  $V_{ctrl} = V_{dd}$ .



Figure 6.19: MC simulation results showing the frequency difference distribution between the two GVCOs (left-most) and the output frequency distributions of both the GVCOs (middle and right-most) when  $V_{ctrl} = V_{dd}/2$ .



Figure 6.20: MC simulation results showing the frequency difference distribution between the two VCOs (left-most) and the output frequency distributions of both the VCOs (middle and right-most); see text.

to-peak. For the purpose of testing the architecture, it does not have a significant effect, though, since a very small number of chips are expected to fail.

Even though the very first test structure is fabricated with the GVCO depicted in Fig. 6.12 on a shared-wafer, the VCO developed for the CP-PLL will be used for the production system. The same set of simulations are performed on the VCO developed for the CP-PLL in the previous chapter. Two VCOs are created and biased in the same way. Fig. 6.20 shows the same distributions for the VCO where the frequency difference distributions between the two VCOs (left-most ones) and the output frequency distributions of both the VCOs (middle and right-most ones) are shown for two cases:  $V_{ctrl} = V_{dd}$  corresponding to the maximum operating frequency and  $V_{ctrl} = 495mV$  corresponding to the nominal operating frequency.

As seen from the distributions, variation in the difference between the two VCO's operating frequencies is almost independent from the MC parameters in use. The left-most distribution on the top row has a worst-case peak-to-peak frequency difference of approximately 0.5 GHz while the output frequencies of the VCOs are approximately 4.9 GHz, corresponding to a frequency difference of approximately  $10 \% ((0.5/4.9) \cdot 100)$ .

The right-most distribution on the bottom row has a worst-case peak-to-peak frequency difference of approximately 0.4 GHz while the output frequencies of the VCOs are approximately 9.77 GHz, corresponding to a frequency difference of approximately 4% ((0.4/9.77) · 100).

Finally, Fig. 6.21 shows the frequency difference distributions with control voltage as the parameter. The variation in frequency difference between the two oscillators along the control curve is acceptably narrow and is fairly independent from the control voltage, or equivalently from the operating frequency.

Concluding the above simulation result, the special biasing scheme used within the VCO establishes the process and mismatch independence required by the architecture. Even in the worst-case conditions, the VCO can perform with a 50 % margin.



Figure 6.21: MC simulation result of differential VCO, showing the frequency difference distribution with  $V_{ctrl}$  as the parameter.

## 6.5 Measurement Results

At the time of writing the only available measurements relate to the test structure of Fig. 6.12. The setup is shown in Fig. 6.22. Fig. 6.23 plots the measured and MC simulated control curves of the GVCO where the output frequency is multiplied by 64. Even though the measurement seems to be within MC boundaries, this is not certain, since only one chip is available for testing. Additionally the  $V_{dd}$  is set to 1.44V whereas the  $V_{ctrl}$  is set equally in both of the cases between 0.35 V and 1.2 V. The reason for the increased  $V_{dd}$  is that the fabricated circuit did not generate a meaningful output signal when the  $V_{ctrl}$  exceeds 0.7 V. Such a condition could not be re-produced in simulation. This is most probably related to the effects of pre-fabrication steps, which can not be simulated because they (e.g. chip filling) are performed by the foundry, and the accuracy of the transistor models in use, as discussed in Chapter 7. The measured power spectrum at the output is given in Fig. 6.24 where the GVCO frequency is 3.84 GHz.

The ambiguity at the rising edge, which is just after the one oscilloscope triggers with, is interpreted as the cycle-to-cycle jitter. Its overlapped waveform and histogram are depicted in Fig. 6.25. The cycle-to-cycle jitter distribution has a standard deviation of 13.08ps. However the oscilloscope used has a limited bandwidth of 500 MHz with an internal jitter. 13.08ps includes the jitter of the oscillo-December 17, 2007



Figure 6.22: The GVCO test setup.



Figure 6.23: The measured and MC simulated control curves of the single-ended GVCO.


Figure 6.24: The measured output power spectrum of the single-ended GVCO.



Figure 6.25: The measured overlapped waveform of the single-ended GVCO at the first rising edge after the one at which the oscilloscope triggers.



Figure 6.26: The measured overlapped waveform of the single-ended VCO at which the oscilloscope triggers.

scope which must be subtracted. Fig. 6.26 shows the overlapped rising edge wave form and its histogram form at which the oscilloscope triggers with. At this edge, there should not be any timing un-certainty related to the input signal. The jitter at this edge (9.52ps) is interpreted as the jitter associated to the measurement system. To subtract the jitter distribution belonging to the measurement system from the cycle-to-cycle jitter at the second rising edge in order to find the jitter associated to the GVCO under test, a simple squared arithmetic is performed as  $J_{VCO,C2C} = \sqrt{13.08^2 - 9.52^2} = 8.97 \, ps$  where  $J_{VCO,C2C}$  is the cycle-to-cycle jitter associated to the GVCO only. The output frequency is 60MHz, that is, the VCO oscillates at 64x60 MHz=3.84 GHz. This corresponds to a period of 1/3.84 GHz=260 ps and the oscillator jitter in units of UI<sup>4</sup> becomes 8.97/260=0.0345 UI or 3.45 % of VCO output period.

163

<sup>&</sup>lt;sup>4</sup>UI stands for unit interval.

### Chapter 7

### Conclusions

### 7.1 The CMAD

Three versions of the binary read-out ASIC for RICH-I detector system of COM-PASS experiment at CERN are designed. The last chip, called CMADv3 or *the* CMAD, is the production chip which will be used in the experimental system. The ASIC is implemented in a commercially available **0.35** $\mu$ m CMOS technology. It amplifies the signals coming from fast multi-anode photo-multipliers and compares them against a *threshold generated on-chip on a channel by channel basis*. The chip provides operational flexibility via adjusting the gain of the channel, setting the level of baseline and the channel threshold independently for each processing channel. The full-custom front-end ASIC CMAD which has 8 identical channels is successfully tested.

The CMAD has an adjustable gain between  $0.4 \ mV/fC$  and  $4.8 \ mV/fC$ , contrary to the old system where the gain was fixed at  $4 \ mV/fC$ . The channel noise, or equivalently the ambiguity in setting the channel threshold is less than 5mV peak-to-peak where a single channel can sustain an efficiency of more than 90% at an input pulse frequency of 6 MHz. The CMAD fulfills *all* design requirements needed by the application in COMPASS RICH-I detector. The ASIC is scheduled to be installed in COMPASS experiment in 2008.

#### 7.1.1 Design Motivation for the Last Prototype

The measurement results acquired with CMADv2 showed that the offset between the channels, contributed by all the building blocks, was larger than expected according to the calculations confirmed by the simulations. Fig. 7.1 shows a measurement result representing the offset problem. The offset between the channels, which can be as big as 50 mV according to the measurements, could not be re-produced in the simulation with a comparable level.

A possible reason is the fact that some pre-fabrication processes, such as the so-called *chip filling*, are not done in-house but by the technology provider. Therefore MC simulations we performed can not include numerically the effect of those steps performed at pre-fabrication phase. Even though good layout practices were followed, they are not necessarily supposed to be capable of suppressing such effects to the degree that simulations of the extracted circuit results in.



Figure 7.1: The measurement result showing the offset problem; an identical setting resulting in different effective channel thresholds for different channels and different chips.

Instead of investigating possibilities of re-designing relevant blocks from scratch, which requires relatively a longer time period, we preferred to follow a more practical approach as the remedy to the offset problem as follows.

In CMADv2 prototype, the two most significant bits of the D/A setting the baseline at the output of the shaper and the threshold of the comparator were connected. Therefore these two references were not completely independent, making less room for channel equalization. In the last design, references for baseline and channel threshold are separated. Additionally, longer output transistors are used within the current sinks to further suppress for possible device mismatches, requiring a trivial set of modifications on the layout.

Controlling the baseline and the threshold requires that two D/As must exist per channel instead of one, thus 16 per chip instead of 8. Even though this results in a bigger chip area and a higher power consumption, the decision is still considered acceptable. This is because of the fact that there were enough space to host



Figure 7.2: Measured s-curve after equalization.

an additional D/A of the same size and the power consumption quota was ahead compared to the amount needed by the channels. Moreover, the so-called *channel equalization* is a standard calibration procedure in this type of HEP applications. Therefore in practice, the modifications performed do not require extra effort of any type. Additionally, the resulting functionality is fully compatible with the existing read-out system, leading to no need for any modification in it.

Fig. 7.2 shows a measured s-curve after equalization and the sub-frame is a zoom to the region where the effective channel threshold is. As seen from the zoomed region where the s-curve fits are shown, the maximum difference between the effective channel thresholds is less than 2% of a single digit, corresponding to a *practical zero*.

#### 7.1.2 Outlook

Even though the radiation tolerance required from the CMAD is relatively relaxed as it is going to reside in a location where a low radiation level is expected, it still needs to be tested against radiation-hardness. At the time of writing, however, no radiation test results are available. We hope to irradiate the chip and perform the relevant radiation tolerance measurements in very near future.

## 7.2 The CP-PLL based serializer and the burstmode CDR

The serializer based on a charge-pump phase-locked loop is designed for the GBT13 transceiver ASIC, which has been under development for the upgrade of the LHC. The circuit is implemented in a commercial **130nm CMOS** technology. The serializer depends on the CP-PLL to function properly, therefore the emphasis is maintained on the CP-PLL loop parametrization. In the framework of this thesis a *software tool*, namely the CaPPeLLo<sup>1</sup>, is developed for fast evaluation of CP-PLL loop behavior which would ease the future developments. Additionally a *burst-mode* 

<sup>&</sup>lt;sup>1</sup>CaPPeLLo stands for Charge-Pump Phase Locked Loop parametrizer.

capable *clock and data recovery circuit*, consisting of two adopted architectures, is designed with the same technology as a possible functional extension to the GBT13 transceiver.

In deep sub-micron technologies the performance of circuits depend on many effects related to the layout to an extent which is much greater than that for older processes. General comments on the pre-fabrication phase made for the CMAD also hold. Therefore in the relatively-recent technology used, the layout work should be introduced in a very early stage since it has a more pronounceable contribution to the final performance. Considering the building blocks of the serializer, related layout and schematic designs are un-separable.

An important issue in RF design is the accuracy of the device models. We had two transistor models available for the GBT design, namely *normal* and *RF*. Critical blocks, such as the VCO, are designed using both of the models to better predict the behavior of actually fabricated chips. This approach can be justified based on the validity regions of the device models which are constructed according to the operation frequencies. In case normal and RF models are interpreted as *low* and *high* frequency models, respectively, then the question of at which frequency they diverge is inevitable. As the serializer operates at a moderate frequency of 4.8 *GHz*, relatively a large difference between simulations and actually fabricated chips can be expected. Test prototypes for the building blocks are designed and they are either fabricated on shared projects or are at the process of being submitted for fabrication. However at the time of writing, only a very limited number of measurements which are presented in this thesis, are available.

#### 7.2.1 Outlook

As the project is still in its relatively early stages, there is going to be several prototyping cycles prior to the production system. In the future developments, the collaboration plans to integrate and to test the full serializer and the burst-mode CDR at *full speed*. Additionally after the functional verification, extensive radiation tests are to be performed since the ASIC is expected to experience *high levels of ionizing radiation*.

Page intentionally left blank.

## Bibliography

- [1] The Designer's Guide to Verilog-AMS, Kluwer Academic Publishers, 2004
- [2] Verilog-AMS Language Reference Manual, http://www.verilog-ams.com
- [3] Verilog-A Language Reference Manual, http://www.verilog.org/verilogams/htmlpages/public-docs/lrm/VerilogA/verilog-a-lrm-1-0.pdf
- [4] Standard Description Language Based on the Verilog<sup>TM</sup> Hardware Description Language, IEEE Standard, 1364-1995
- [5] Roland Hegemann, Inbetriebnahme eines schnellen Photondetektors fur den ringabbildenden Cerenkov-Detektor am COMPASS-Experiment, diploma thesis, Physikalisches Institut, Albert-Ludwigs-Universitat, Freiburg
- [6] T.Y. Lee and Y. Cheng, High-Frequency Characterization and Modeling of Distortion Behavior of MOSFETs for RF IC Design, IEEE Journal of Solid-State Circuits, Vol. 39, No. 9, September, 2004
- [7] G. Bunce et al.,  $\Lambda^0$  hyperon polarization in inclusive production by 300 GeV protons on beryllium, Phys. Rev. Lett. 36, 1113-1116 (1976)
- [8] VHDL Language Reference Manuel, IEEE Standard, 1076-1993
- [9] iVerilog Language Ref., http://www.icarus.com/eda/verilog
- [10] P. Moreira, A. Marchioro, K. Kloukinas, The GBT a Proposed Architecture for Multi - Gb/s Data Transmission in High Energy Physics, Topical Workshop on Electronics for Particle Physics, TWEPP, 3-7 Sep. 2007, Prague/Czech Republic

- [11] Sorin Martoiu, private communication
- [12] Q. Huang, C. Menolfi, and C. Hammerschmied, A MOSFET-only interface for integrated flow sensors, in Proc. Int. Symp. Circuits Syst. ISCAS, Atlanta, GA, May 1996, Vol. 4, pp. 372-375
- [13] J. Christiansen, A. Marchioro, P. Moreira, TTCrx an ASIC for Timing, Trigger and Control Distribution in LHC Experiments, Proceedings of 2nd Workshop on Electronics for LHC Experiments, Balatonfured, 23-27 September 1996
- [14] J. Christiansen, A. Marchioro, P. Moreira, T. Toifl, A Timing, Trigger and Control Receiver ASIC for LHC Detectors, CERN-EP/MIC, Geneva Switzerland, Version 3.8, Jan. 2003
- [15] P. Moreira, A. Marchioro, QPLL, a quartz crystal based PLL for jitter filtering applications in LHC, Proceedings of the 9th Workshop on Electronics for LHC Experiments, October 2003
- [16] O. Çobanoğlu, CP-PLL iVerilog model, http://www.ph.unito.it/~cobanogl / CDR / iVerilogPLLtutor / iVerilogPLLmodel\_byOC.htm
- [17] Cadence Design Systems, www.cadence.com
- [18] M. Laub, Development of opto-mechanical tools and procedures for the new generation of RICH-detectors at CERN, PhD dissertation, Prague Technique Univ., Prague, 2001
- [19] O. Çobanoğlu, F. Özok, P. V. Vyvre, Development of An On-Line Data Quality Monitor For The Relativistic Heavy-Ion Experiment ALICE, IEEE Real Time Systems Conference, Stockholm, Sweden, 4-10 June, 2005
- [20] Gell-Mann, Phys. Lett. 8 (1964) 214
- S. Gilardoni, [21] E. Metral, Introduction to accelerators, in CERN available Summer Student Lectures 2006,online at: http://agenda.cern.ch/fullAgenda.php?ida=a062758

- [22] SLAC-SP-017 Collaboration, J. E. Augustin et al., Phys. Rev. Lett. 33 (1974) 1406
- [23] E598 Collaboration, J. J. Aubert et al., Phys. Rev. Lett. 33 (1974) 1404
- [24] http://sl.web.cern.ch/SL/eagroup/NewM2/main.html
- [25] R. P. Feynman, Phys. Rev. Lett. 23 (1969) 1415
- [26] O. Çobanoğlu among other authors, ALICE Technical Design Report of the Trigger Data Acquisition High-Level Trigger and Control System, CERNL-HCC2003062, ALICE TDR 10, 7 January 2004
- [27] M. J. Alguard et al., Phys. Rev. Lett. 37 (1976) 1258
- [28] G. Puill et al., Development of MICROMEGAS, a novel position sensitive gas detector with micro-mesh, DAPNIA-SED-00-01-T
- [29] F. Sauli, GEM: A new concept for electron amplification in gas detectors, Nucl. Inst. Methods A386, 531-534, (1997)
- [30] G. Baum et al., Phys. Rev. Lett. 51 (1983) 1135
- [31] European Muon Collaboration, J. Ashman et al., Phys. Lett. B206 (1988) 364
- [32] European Muon Collaboration, J. Ashman et al., Nucl. Phys. B328 (1989) 1
- [33] J. R. Ellis and R. L. Jaffe, Phys. Rev. D9 (1974) 1444
- [34] F. E. Close and R. G. Roberts, Phys. Lett B316 (1993) 165
- [35] Spin Muon Collaboration, B. Adeva et al., Phys. Rev. D58 (1998) 112001
- [36] J. G. Maneatis, Low-Jitter Process-Independent DLL and PLL Based on Self-Biased Technique, Journal of Solid State Circuits, Vol. 31, No. 11, November 1996
- [37] M. Nogawa, 2005 IEEE International Solid-State Circuits Conference, 12.5, 2005

- [38] E142 Collaboration, P. L. Anthony et al., Phys. Rev. D54 (1996) 6620
- [39] A. Bravar, D. v. Harrach, and A. Kotzinian, Large gluon polarization from correlated high-p(T) hadron pairs in polarized electro-production, Phys. Lett. B421, 349-359, (1998)
- [40] Paolo Delaurenti, Analysis and Design of a Fast Binary Front-End Chip for the COMPASS Experiment at CERN, Master Degree Thesis, University of Turin, 2006
- [41] E143 Collaboration, K. Abe et al., Phys. Rev. Lett. 75 (1995) 25
- [42] E154 Collaboration, K. Abe et al., Phys. Rev. Lett. 79 (1997) 26-30
- [43] E155 Collaboration, P. L. Anthony et al., Phys. Lett. B463 (1999) 339
- [44] O. Çobanoğlu among other authors, ALICE DAQ and ECS Users Guide, January 2006, ALICE DAQ Project, ALICE Internal Note/DAQ, ALICE-INT-2005-015
- [45] HERMES Collaboration, A. Airapetian et al., Phys. Lett. B442 (1998) 484
- [46] Xiang-Dong Ji, Deeply-virtual Compton scattering, Phys. Rev. D55, 7114-7125, (1997)
- [47] M. Wiesmann, A Silicon Micro-strip Detector for COMPASS and A First Measurement of the Transverse Polarization of Λ<sup>0</sup>-Hyperons from Quasi-Real Photo-Production, PhD dissertation, Technische Universitat Munchen, 2004
- [48] M. Mattson et al., First observation of the doubly charmed baryon  $\Xi_{cc}^+$ , Phys. Rev. Lett. 89, 112-001 (2002)
- [49] S. L. Glashow, J. Iliopoulos, and L. Maiani, Weak interactions with lepton hadron symmetry, Phys. Rev. D2, 1285-1292, (1970)
- [50] HERMES Collaboration, C. Riedl et al., Proceedings of the 16th International Spin Physics Symposium, Trieste, Italy, October 10-16 2004

- [51] K. Ackerstaff et al. Flavor decomposition of the polarized quark distributions in the nucleon from inclusive and semi-inclusive deep inelastic scattering, Phys. Lett. B464, 123-134 (1999)
- [52] COMPASS Collaboration, P. L. Anthony et al., CERN/SPSLC 96-14 (1996)
- [53] E. Nappi et al., The Hadron Muon Collaboration Letter of Intent, CERN/SP-SLC 95-27, SPSLC/I204, March 1995 (1995)
- [54] S. Paul et al., CHEOPS, Letter of Intent, CERN/SPSLC 95-22, SPSLC/I202 March (1995)
- [55] The NMC Collaboration. Detailed Measurements of Structure Functions from Nucleons and Nuclei. Proposal to the SPSC, CERN/SPSC 85-18 SPSC/P210, 27. Feb. 1985
- [56] COMPASS Proposal, CERN / SPSLC 96-14, SPSC / P 297, March 1, 1996
- [57] COMPASS Status Report 2006, CERN / SPSC 2006-013, SPSC-SR-007, April 18, 2006
- [58] F. Gonella and M. Pegoraro, "The MAD", a Full Custom ASIC for the CMS Barrel Muon Chambers Front End Electronics
- [59] G. Baum, et al., BORA: a front-end board, with local intelligence, for the RICH detector of the Compass Collaboration, NIM, Section A, 433 (1999) 426-431
- [60] G. Baum, et al., The COMPASS RICH Project, NIM-A, 433 (1999) 207-211
- [61] G. Baum et al., The COMPASS RICH-I Read-Out System, NIM-A 502 (2003) 246-250
- [62] D. M. Monticelli, A Quad CMOS Single-Supply Op Amp with Rail-to-Rail Output Swing, IEEE Journal of Solid-State Circuits, Vol. SC-21, No. 6, December 1986

- [63] M. van Ierssel et al., A 3.2Gb/s Semi-Blind-Oversampling CDR, ISSCC, Session 18, 18.5, Clock and Data Recovery, 2006
- [64] J. Kim et al., Multi-Gigabit-Rate Clock and Data Recovery Based on Blind Oversampling, IEEE Communications Magazine, 2003
- [65] L. M. DeVito, A Versatile Clock Recovery Architecture and Monolithic Implementation, Monolithic Phase-Locked Loops and Clock Recovery Circuits, IEEE Press, pp. 405-430, 1996
- [66] J. Yuan et al., High-Speed CMOS Circuit Technique, IEEE Journal of Solid State Circuits, Vol. 24, no. 1, 1989
- [67] A. E. Dunlop et al., 150/30Mb/s CMOS Non-Oversampled Clock and Data Recovery Circuits with Instantaneous Locking and Jitter Rejection, ISSCC, Session 2, 2.7, 1995
- [68] G. De Geronimo, P. O'Connor and J. Grosholz, "A CMOS Baseline Holder (BLH) for Readout ASICs", IEEE TNS vol. 47, no. 3, June 2000
- [69] Klaas Bult, and Govert J.G.M. Geelen, An Inherently Linear and Compact MOST-Only Current Division Technique, IEEE Journal of Solid State Circuits, Vol. 27, No. 12, 1992
- [70] Low Dropout Voltage Regulator Operation and Performance, Application Report SLVA072, 1999, Texas Instruments
- [71] Snoeys W.J., Gutierrez T.A.P., Anelli G., A new NMOS layout structure for radiation tolerance, IEEE Transactions on Nuclear Science, Vol. 49, No. 4, Part 1, 2002
- [72] G. Anelli, et al., Radiation tolerant VLSI circuits in standard deep sub-micron CMOS technologies for the LHC experiments: practical design aspects, IEEE Transactions on Nuclear Science, Vol. 46, No. 6, 1999

- [73] C.M. Hammerschmied, et al., Design and Implementation of an Untrimmed MOSFET-Only 10-Bits A/D Converter with -79-dB THD, IEEE Journal of Solid State Circuits, Vol. 33, No. 8, 1998
- [74] COMPASS Proposal, CERN/SPSLC 96-14, SPSC/P 297, March 1, 1996
- [75] P. Moreira, et al., G-Link and Gigabit Ethernet Compliant Serializer for LHC Data Transmission, IEEE Nuclear Science Symposium Conference Record, October 15-20, 2000, Lyon, France, pp. 9.6-9.9
- [76] M.J.M. Pelgrom, et al., Matching properties of MOS transistors, IEEE Journal of Solid-State Circuits, Vol. 24, No. 5, 1989
- [77] B.G.Taylor, Timing Distribution at the LHC, Proc. 8th Workshop on Electronics for LHC Experiments, Colmar, France, 9-13 September 2002, CERN 2002-003, pp. 63-74
- [78] P. Moreira, G. Cervelli, J. Christiansen, F. Faccio, A. Kluge, A. Marchioro, T. Toifl, A radiation tolerant gigabit serializer for LHC data transmission, Proc. of the 7th Workshop on Electronics for LHC Experiments, Stockholm, October 2001, pp 150-154
- [79] A. De Roeck, The LHC Upgrade, CERN Summer Student Lectures 2006, available on-line at: http://agenda.cern.ch/fullAgenda.php?ida=a062750
- [80] O. Çobanoğlu, et al., "CMAD", a Full Custom ASIC for the Upgrade of COM-PASS RICH-1, LECC2006, 12th Workshop on Electronics for LHC and Future Experiments, 25-29 September 2006, Valencia, Spain
- [81] G. Papotti, An Error-Correcting Line Encoding ASIC for a HEP Rad-Hard Multi-GigaBit Optical Link, PRIME-2006 Conference Record, Lecce, Italy
- [82] R. Brun and F. Rademakers, ROOT An object oriented data analysis framework, Nuclear Instruments and Methods in Physics Research Section A, Volume 389, Issues 1-2, 11 April 1997

- [83] Chi-Hung Lin and K. Bult, A 10-b 500-MSample/s CMOS DAC in 0.6mm<sup>2</sup>, IEEE Journal of Solid-State Circuits, Vol. 33, No. 12, 1998
- [84] Johan H. Huijsing, Operational Amplifiers: Theory and Design, Kluwer, 2001
- [85] O. Çobanoğlu, Embedded D/A Converters for High Energy Physics Instrumentation, 4th Eurasian Conference, Nuclear Science and Its Application, Baku, Azerbaycan, 2006
- [86] P. Moreira, private communication
- [87] Snoeys W., Anelli G., et al., Integrated circuits for particle physics experiments, IEEE Journal of Solid-State Circuits, Vol. 35, No. 12, 2000
- [88] Austria Micro Systems,  $0.35\mu m$  Process Manual, ENG-228
- [89] B. Razavi, Design of Integrated Circuits for Optical Communications, Chicago, McGraw Hill, 2002
- [90] A. Hajimiri et al., Jitter and Phase Noise in Ring Oscillators, IEEE Journal of Solid-State Circuits, Vol. 34, No. 6, 1999
- [91] B. Razavi, Design of Analog CMOS Integrated Circuits, ISBN 0-07-238032-2
- [92] F. M. Gardner, Phase-lock Techniques, John Wiley & Sons
- [93] F. M. Gardner, Charge-Pump Phase-Lock Loops, IEEE Journal of Solid-State Circuits, Vol. Com. 28, No. 11, 1980
- [94] T. Y. K. Wong et al., A 10GB/s ATM Data Synchronizer, IEEE Journal of Solid-State Circuits, Vol. 31, No. 10, 1996
- [95] G. Papotti, Architectural Studies of a Radiation-Hard Transceiver ASIC in 0.13μm CMOS for Digital Optical Links in High Energy Physics Applications, Ph.D. thesis, University of Parma, January 2007.
- [96] S. Gogaert et al., A Skew Tolerant CMOS Level-Based ATM Data-Recovery System without PLL Topology, IEEE Custom Integrated Circuits Conference, 1997

- [97] M. Nakamura et al., A 156 Mbps CMOS Clock and Data Recovery for Burst-Mode Transmission, Symposium on VLSI Circuits Digest of Technical Papers, 1996
- [98] C. K. K. Yang, A 0.5-µm CMOS 4.0-Gbit/s Serial Link Transceiver with Data Recovery Using Oversampling, IEEE Journal of Solid-State Circuits, Vol. 33, No. 5, 1998
- [99] R. S. Co et al., Optimization of Phase-Locked Loop Performance in Data Recovery Systems, IEEE Journal of Solid-State Circuits, Vol. 29, No. 9, 1994
- [100] R. C. Walker, Designing Bang-Bang PLLs for Clock and Data Recovery in Serial Data Transmission Systems, pp. 34-45, a chapter appearing in "Phase-Locking in High-Performance Sytems - From Devices to Architectures", IEEE Press, 2003, ISBN 0-471-44727-7
- [101] J. Lee et al., Analysis and Modeling of Bang-Bang Clock and Data Recovery Circuits, IEEE Journal of Solid-State Circuits, Vol. 39, No. 9, 2004
- [102] M. Thamsirianunt, CMOS VCOs for PLL Frequency Synthesis in GHz Digital Mobile Radio Communications, IEEE Custom Integrated Circuits Conference, 1995
- [103] M. H. Perrot et al., A Modeling Approach for Sigma-Delta Fractional-N Frequency Synthesizers Allowing Straightforward Noise Analysis, IEEE Journal of Solid-State Circuits, Vol. 37, No. 8, 2002
- [104] F. M. Gardner, Hangup in Phase-Lock Loops, IEEE Trans. Commun., Vol. COM-25, pp. 1210-1214, 1977
- [105] McNeill, J.A., Jitter in Ring Oscillators, IEEE Journal of Solid-State Circuits, Vol. 32, No. 6, 1997
- [106] G. Nash, Phase-Locked Loop Design Fundamentals, Motorola Application Note, AN535, 1994

- [107] A. A. Abidi, Phase Noise and Jitter in CMOS Ring Oscillators, IEEE Journal of Solid-State Circuits, Vol. 41, No. 8, 2006
- [108] Instantaneous Clock-less Data Recovery and De-multiplexing, Behnam Analui and Ali Hajimiri, IEEE Transactions On Circuits and Systems-II: Express Briefs, Vol. 52, No. 8, August 2005
- [109] V. von Kaenel et al., A 320 MHz 1.5 mW at 1.35 V CMOS PLL for Microprocessor Clock Generation, IEEE Journal of Solid-State Circuits, Vol. Com. 28, No. 11, 1980
- [110] C. H. Park, A Low-Noise 900MHz VCO in 0.6-µm CMOS, IEEE Journal of Solid-State Circuits, Vol. 34, No. 5, 1999
- [111] B. C. Kuo, Automatic Control Systems, Prentice-Hall, Inc., New Jersey, 1962
- [112] P. McCollum and B. Brown, Laplace Transform Tables and Theorems, Holt, New York, 1965

# Appendix A

### Second-Order systems

Generic second-order system behavior (strictly relevant both to the operational amplifiers and to the CP-PLL described in this thesis) and parameter selection for a proper design are summarized in this chapter with some practical details. The content does not cover device level implementation detail, though.

## A.1 Introduction : Time and Frequency Domain Relationships

There are fundamentally two main reasons for considering the time and frequency domain relationships of second-order systems. The first one is that many control loops (e.g. operational amplifiers, phase locked loops, etc.) can be modeled with reasonable accuracy assuming just a second-order system. Such a procedure represents a reasonable compromise between complexity and accuracy of the model. The second reason is that these relationships allow us to predict frequency domain performance from the simpler-to-measure time domain performance.

### A.1.1 General Second-Order Systems in the Frequency Domain

The general transfer function of a low-pass, second-order system in the frequency domain using voltage variables is

$$A(s) = \frac{V_o(s)}{V_{in}(s)} = \pm \frac{A_0 \omega_n^2}{s^2 + 2\xi \omega_n s + \omega_n^2} = \pm \frac{A_0 \omega_n^2}{s^2 + (\frac{\omega_0}{Q})s + \omega_0^2}$$
(A.1.1)

where  $A_0$  is the low frequency gain of  $V_o(s)/V_{in}(s)$ ,  $\omega_0 = \omega_n$  is the pole frequency in rad/s,  $\xi = 1/2Q$  is the damping factor. The roots of Eq. A.1.1 are illustrated in Fig. A.1 where point A on imaginary axis is equal to  $j\omega_n\xi\sqrt{(1/\xi)^2 - 1}$  and point B on real axis is equal to  $-\xi\omega_n$ . The length of the vectors is the natural frequency of the loop and the cosine of the angle,  $\theta$ , with respect to real axis gives the damping factor, for  $0^o < \theta < 90^o$ .

The magnitude of frequency response can be found from Eq. A.1.1 as:

$$|A(j\omega)| = \frac{A_0 \omega_n^2}{\sqrt{(\omega_n^2 - \omega^2)^2 + (2\xi\omega_n\omega)^2}}$$
(A.1.2)

However, Eq. A.1.2 may be generalized by normalizing the amplitude with respect to  $A_0$  and the radian frequency by  $\omega_n$  to give Eq. A.1.1 as:

$$\frac{|A(j\omega/\omega_n)|}{A_0} = \frac{1}{\sqrt{[1 - (\omega/\omega_n)^2]^2 + (2\xi\omega/\omega_n)^2}}$$
(A.1.3)



Figure A.1: Pole locations of general second-order system.



Figure A.2: Gain magnitude response of a low-pass second-order system for various damping factors.

A plot of Eq. A.1.3 in dB versus  $\log \omega/\omega_n$  is shown in Fig. A.2, where  $\xi$  is used as a parameter. By taking the derivative of Eq. A.1.3 with respect to  $\omega/\omega_n$  and setting it to zero, the peak magnitude of normalized transfer function, when  $\xi < 0.707$ , can be found as:

$$M_p = \frac{1}{2\xi\sqrt{1-\xi^2}}$$
(A.1.4)

The second-order transfer function of Eq. A.1.1 is found in the analysis of many practical control loops. Considering the single-loop, feedback system shown in Fig. A.3, the closed loop gain A(s) can be expressed as:

$$A(s) = \frac{V_o(s)}{V_i(s)} = \frac{\alpha a\beta}{1 + a\beta}$$
(A.1.5)

Let us assume that  $\alpha$  and  $\beta$  are real and that a is the amplifier's gain and can be approximated as

$$a(s) = \frac{a_0 \omega_1 \omega_2}{(s + \omega_1)(s + \omega_2)}$$
(A.1.6)



Figure A.3: Example single-loop feedback system.

where  $a_0$  is the dc gain of the amplifier and  $\omega_1$  and  $\omega_2$  are negative real axis poles. Substituting Eq. A.1.6 into Eq. A.1.5 gives:

$$A(s) = (\alpha\beta) \frac{a_0 \omega_1 \omega_2}{s^2 + (\omega_1 + \omega_2)s + \omega_1 \omega_2 (1 + a_0\beta)}$$
(A.1.7)

Comparing Eq. A.1.1 with Eq. A.1.7 results in the following identifications:

$$A_0 = \frac{\alpha a_0 \beta}{1 + a_0 \beta} \tag{A.1.8}$$

$$\omega_n = \omega_0 = \sqrt{\omega_1 \omega_2 (1 + a_0 \beta)} \tag{A.1.9}$$

$$2\xi = 1/Q = \frac{\omega_1 + \omega_2}{\sqrt{\omega_1 \omega_2 (1 + a_0 \beta)}}$$
(A.1.10)

The same principles can be applied to a second-order band-pass and high-pass systems, but considering the relevance, the low-pass case is of more practical interest and will be the only one considered in this chapter. It is also possible for  $\beta$  and/or  $\alpha$  to be frequency dependent which further complicates the analysis.

#### A.1.2 General Second-Order Systems in the Time Domain

It is time consuming to make measurements in the frequency domain, therefore, one would be interested in extracting the frequency domain behavior out of time domain performance. We will briefly develop such an approach in this part. The general response of Eq. A.1.1 to a unit step can be written as

$$v_0(t) = A_0 \left[ 1 - \frac{1}{\sqrt{1 - \xi^2}} e^{-\xi \omega_n t} \sin(\sqrt{1 - \xi^2} \omega_n t + \phi) \right]$$
(A.1.11)

where  $\phi$  is as follows:

$$\phi = \tan^{-1}\left(\frac{\sqrt{1-\xi^2}}{\xi}\right) \tag{A.1.12}$$



Figure A.4: Step response as a function of  $\xi$  for a low-pass, type-I, second-order system.

The step response plotted in normalized amplitude versus radians is shown in Fig. A.4. Care must be taken interpreting this figure: it is normalized, the geometric mean of  $\omega_1$  and  $\omega_2$  remains constant, that is,  $\omega_n$  remains constant and the time constant remains fixed as  $\xi$  varies<sup>1</sup>.

Let us consider the under-damped case, that is,  $\xi < 1$ . For under-damped case there will always be an overshoot which is defined as:

$$Overshoot = \frac{Peak \, value - Final \, value}{Final \, value} = exp\left(\frac{-\pi\xi}{\sqrt{1-\xi^2}}\right) \tag{A.1.13}$$

The time instance where the overshoot has its peak is denoted by  $t_p$  and is calculated

<sup>&</sup>lt;sup>1</sup>When compensating amplifiers, this situating does not exist. The geometric mean of the dominant and non-dominant poles is increased until the desired settling response is achieved.

as:

$$t_p = \frac{\pi}{\omega_n \sqrt{1 - \xi^2}} \tag{A.1.14}$$

Thus, the measurement of the overshoot permits the calculation of  $\xi$ . With this information and measurement of  $t_p$ , one can calculate  $\omega_n$  from Eq. A.1.14. Therefore, the frequency response of a low-pass, second-order feedback system with  $\xi < 1$  can be determined by measuring the overshoot and  $t_p$  of the step response.

There are two other possibilities where  $\xi$  is either 1 or bigger than 1. Especially in amplifier design, these two are very rare conditions, though. However, some control loops can have damping ratios of the order of 10 (e.g. an over-damped phase locked loop where overshoot is expected to be as low as 0.1dB). For these conditions, the best method would be referring to Fig. A.4 and matching step response to one of the curves for  $\xi > 1$ .

### A.1.3 Determination of Phase Margin and Crossover Frequency from $\xi$ and $\omega_n$

In the previous subsection, how  $\xi$  and  $\omega_n$  of a general second-order system can be extracted out of the closed-loop step response is shown, whereas this subsection deals with predicting phase margin and crossover frequency based on  $\xi$  and  $\omega_n$  which are already determined. Fig. A.5 shows the definitions of phase margin,  $\phi_m$ , and crossover frequency,  $\omega_c$  on Bode plots of an arbitrary system. Crossover frequency denoted by A in the figure is the point where the gain of the system drops to 0 and phase margin denoted by B in the figure is how far the phase of the system is from being fully inverted at that frequency.

It will be convenient, in developing the relationships, to assume  $\beta$  and  $\alpha$  are real values. Solving Eq. A.1.5 for *a* yields

$$a\beta = \frac{1/\alpha}{(1/A) - (1/\alpha)}$$
 (A.1.15)

and substituting Eq. A.1.1 into Eq. A.1.15 gives the loop gain as:

$$a\beta = \frac{A_0\omega_n^2/\alpha}{(s^2 + 2\xi\omega_n s + \omega_n^2) - (A_0\omega_n^2/\alpha)} = \frac{A_0/\alpha}{(\frac{s}{\omega_n})^2 + 2\xi(\frac{s}{\omega_n}) + 1 - \frac{A_0}{\alpha}}$$
(A.1.16)  
December 17, 2007



Figure A.5: Definitions of crossover frequency (A) and phase margin (B).

When  $|a\beta| = 1$ , then  $\omega = \omega_c$  so that Eq. A.1.16 becomes:

$$|a\beta| = \frac{A_0/\alpha}{\sqrt{\left[1 - \frac{A_0}{\alpha} - \left(\frac{\omega_c}{\omega_n}\right)^2\right]^2 + \left[2\xi\left(\frac{\omega_c}{\omega_n}\right)\right]^2}}$$
(A.1.17)

Since  $|a\beta| = 1$ , we may solve Eq. A.1.17 for  $\omega_c$  to get:

$$\omega_c = \omega_n \left[ \sqrt{\left[ 2\xi^2 - (1 - \frac{A_0}{\alpha}) \right]^2 - (1 - \frac{2A_0}{\alpha})} - 2\xi^2 + (1 - \frac{A_0}{\alpha}) \right]^{1/2}$$
(A.1.18)

Knowing  $A_0$ ,  $\alpha$ , and  $\xi$ , one may calculate the cut-off frequency of a second-order system. As an example, in an operational amplifier circuit,  $\alpha = A_0$  so that Eq. A.1.18 becomes:

$$\omega_c = \omega_n \left[ \sqrt{4\xi^4 + 1} - 2\xi^2 \right]^{1/2} \tag{A.1.19}$$

The phase of  $a\beta$  can be found from Eq. A.1.16. However,  $\pm\pi$  must be added to this value to account for the minus sign of the summing junction of Fig. A.3; thus:

$$\Phi_m = -tan^{-1} \left( \frac{2\xi\omega_c/\omega_n}{(1 - A_0/\alpha) - (\omega_c/\omega_n)^2} \right)$$
(A.1.20)

Since  $A_0 = \alpha$ , one may write Eq. A.1.20 as:

$$\Phi_m = tan^{-1} \left( \frac{2\xi}{(\omega_c/\omega_n)} \right)$$
(A.1.21)
  
December 17, 2007



Figure A.6: Overshoot as a function of the damping factor for a second-order system.

Substituting Eq. A.1.19 into Eq. A.1.21 yields

$$\Phi_m = \tan^{-1} \left( \frac{2\xi}{\sqrt{\sqrt{4\xi^4 + 1} - 2\xi^2}} \right)$$
(A.1.22)

where an equivalent form of Eq. A.1.22 is:

$$\Phi_m = \cos^{-1} \left( \sqrt{4\xi^4 + 1} - 2\xi^2 \right) \tag{A.1.23}$$

Fig. A.6 and Fig. A.7 show the plots of *Overshoot* and *phase margin*,  $\Phi_m$ , as functions of the *damping factor*,  $\xi$ . Therefore, the time domain performance characterized by  $\xi$  permits the designer to estimate a value of phase margin using Eq. A.1.23 or the plots. As an example, if the design specifications require an overshoot of 0.1, then the damping factor must be around 0.6 leading to a phase margin of around 58 degrees.



Figure A.7: Phase margin in degrees as a function of the damping factor for a second-order system.

## A.2 A Practical Case Study : PLL Parametrization

In this section, practical definitions will be given as example of a type-II, secondorder PLL parametrization. Fig. A.8 shows the generic PLL architecture and defines the parameters used throughout the section:  $\theta_i(s)$ ,  $\theta_e(s)$  and  $\theta_o(s)$  represent phase input, phase error and output phase, respectively. G(s) and H(s) represent products of the individual feed-forward and feedback transfer functions, respectively.  $K_p$ ,  $K_f$ ,  $K_o$  and  $K_n$  are the gains of each building block.  $f_i$  and  $f_o$  are the input and the output frequencies, respectively. Using the servo theory, the following relations are valid [111]:

$$\theta_e(s) = \frac{\theta_i(s)}{1 + G(s)H(s)} \tag{A.2.24}$$

$$\theta_o(s) = \frac{G(s)\theta_i(s)}{1 + G(s)H(s)} \tag{A.2.25}$$

The phase detector produces a voltage proportional to the phase difference be-December 17, 2007



Figure A.8: Phase locked loop and the parameters used in the example.

tween the signals  $\theta_i$  and  $\theta_o/N$ . This voltage, upon filtering, is used as the control signal for the VCO. Since the VCO generates a frequency which is proportional to its input voltage, any time variant signal appearing on the control signal (not shown in the figure) will frequency modulate the VCO. The output frequency is  $f_o = Nf_i$ during phase lock. The phase detector, the filter, and the VCO compose the feed forward path (G(s)) with the feedback path (H(s)) containing the divider (%N). Removal of the programmable counter produces unity gain in the feedback path (N = 1). As a result, the output frequency is then equal to that of the input.

#### A.2.1 Definitions

Before starting parametrization, practical-interest high level definitions will be reviewed for clarity in this subsection.

#### Type and Order

These two terms are used somewhat indiscriminately in published literature, and to date there has not been an established standard. However, the most common usage will be identified and used in this subsection.

The type of a system refers to the number of poles of the loop transfer function G(s)H(s) located at the origin. As an example

$$G(s)H(s) = \frac{10}{s(s+10)}$$
(A.2.26)

is a *type I* system since there is only one pole at the origin.

The order of a system refers to the highest degree of the polynomial expression 1 + G(s)H(s) which is named as *Characteristic Equation* (C.E.). The roots of the characteristic equation become the closed loop poles of the overall transfer function. As an example

$$G(s)H(s) = \frac{10}{s(s+10)}$$
(A.2.27)

$$1 + G(s)H(s) = 1 + \frac{10}{s(s+10)} = 0$$
(A.2.28)

therefore

$$C.E. = s(s+10) + 10 = s^2 + 10s + 10$$
 (A.2.29)

which is a *second-order* polynomial, thus the given G(s)H(s) forms a *Type-I*, *second-order* system.

Another example which is of more practical interest is

$$G(s)H(s) = \frac{(s+a)k}{s^2}$$
 (A.2.30)

which is a type-II, second-order system since there are two poles at the origin. A zero is added to provide stability<sup>2</sup>. The root locus shown in Fig. A.9 has two branches beginning at the origin with one asymptote located at 180 degrees. The *center of gravity* is s = a; however, with only one asymptote, there is no intersection at this point. The root locus lies on a circle centered at s = -a and continues on all portions of the negative real axis to left of the zero. The *breakaway* point is s = -2a.

The respective phase or output frequency response of this type-II, second-order system to a step input is shown in Fig. A.10. The required  $\omega_n$  can be determined by the use of the graph when  $\xi$  and the lock-up time are given.

#### Bandwidth

The -3dB bandwidth of the PLL is given by

$$\omega_{-3dB} = \omega_n \sqrt{1 + 2\xi^2 + \sqrt{2 + 4\xi^2 + 4\xi^4}}$$
(A.2.31)

for a type-II, second-order system [92].

 $<sup>^{2}</sup>$ Without the zero, the poles would move along the imaginary axis as a function of gain and the system would -at all times- be oscillatory in nature.



Figure A.9: Root locus of a type-II, second-order system.



Figure A.10: Step response of a type-II, second-order system.

#### **Steady State Condition**

In evaluating a system,  $\theta_e(s)$  must be examined in order to determine whether the steady state and transient characteristics are optimum. Various inputs can be applied to a system. Typically, these include *step position*, *velocity*, and *acceleration*.

The steady state evaluation can be simplified with the use of the final value theorem associated with Laplace. This theorem permits finding the steady state system error  $\theta_e(s)$  resulting from the input  $\theta_i(s)$  without transforming back to the time domain [112] as follows

$$\lim_{t \to \infty} [\theta(t)] = \lim_{s \to 0} [s\theta_e(s)] \tag{A.2.32}$$

where

$$\theta_e(s) = \frac{\theta_i(s)}{1 + G(s)H(s)} \tag{A.2.33}$$

and the 3 possible inputs to a phase locker are as follows:

- 1. Step Position in frequency domain is  $\theta_i(s) = C_p/s$ , where  $C_p$  is the magnitude of the phase step in radians. This corresponds to shifting the phase of the incoming reference signal by  $C_p$  radians.
- 2. Step Velocity in frequency domain is  $\theta_i(s) = C_y/s^2$  where  $C_y$  is the magnitude of the rate of change of phase in rad/s. This corresponds to inputting a frequency that is different than the feedback portion of the VCO frequency. Thus,  $C_v$  is the frequency difference in rad/s seen at the phase detector.
- 3. Step Acceleration in frequency domain is  $\theta_i(s) = 2C_a/s^3$  where  $C_a$  is the magnitude of the frequency rate of change in  $rad/s^2$ . This is characterized by a time variant frequency input.

Typical loop G(s)H(s) transfer functions for types I, II, and III are:

$$Type - I \qquad G(s)H(s) = \frac{K}{s(s+a)} \tag{A.2.34}$$

$$Type - II \qquad G(s)H(s) = \frac{K(s+a)}{s^2}$$
 (A.2.35)

$$Type - III \qquad G(s)H(s) = \frac{K(s+b)(s+b)}{s^3}$$
 (A.2.36)

|                   | Type-I     | Type-II  | Type-III |
|-------------------|------------|----------|----------|
| Step Position     | Zero       | Zero     | Zero     |
| Step Velocity     | Constant   | Zero     | Zero     |
| Step Acceleration | Increasing | Constant | Zero     |

Table A.1: Steady-state phase errors for various system types

As an example, the *final value* of the *phase error* for a type-I system with a *step* phase input is found by using Eq. A.2.33 and the step position phrase of  $\theta_i(s) = C_p/s$  as:

$$\theta_e(s) = \left(\frac{1}{1 + \frac{K}{s(s+a)}}\right) \left(\frac{C_p}{s}\right) = \frac{(s+a)C_p}{s^2 + as + K}$$
(A.2.37)

$$\theta_e(t=\infty) = \lim_{s \to 0} \left[ s \left( \frac{s+a}{s^2+as+K} \right) C_p \right] = 0 \tag{A.2.38}$$

Thus, the final value of the phase error is zero when a step position (i.e. phase step) is applied. Similarly, applying the three inputs into type I, II, and III systems and utilizing the final value theorem, Table A.1 can be constructed showing the respective steady state phase errors.

A zero phase error identifies phase coherence between the two input signals at the phase detector. A constant phase error identifies a phase differential between the two input signals at the phase detector input. The magnitude of this differential phase error is proportional to the loop gain and the magnitude of the input step. A continually increasing phase error identifies a time rate change of phase. This is an unlocked condition for the phase locked loop.

Using Table A.1, the system type can be determined for specific inputs. For instance, if it is desired for a PLL to track a reference frequency (step velocity) with zero phase error, a minimum of type-II is required.

#### Stability

The transient response is a function of loop stability and the root locus technique for determining the position of system poles and zeroes in the s-plane is often used to graphically visualize the continuous-time approximated system stability. The plot illustrates how the closed loop poles (roots of the characteristic equation) vary



Figure A.11: Pole locations on s-plane and expected behaviors associated.

with loop gain. For stability, all poles must lie in the left half of the s-plane<sup>3</sup>. The relationship of the system poles and zeroes then determine the degree of stability. Fig. A.11 shows the pole locations and corresponding behaviors. As an example, for a phase locker design, one would be interested in having the poles as close as to the real axis on the left hand side. Being close to the real axis means lower oscillation, if any, and being on the left hand side means that the behavior would have a decaying nature (stability). As one departs from the real axis, oscillatory nature becomes significant gradually and as one departs from the imaginary axis, decaying/growing nature becomes significant gradually.

Similarly, for taking the discrete nature of a CP-PLL into account, z-plane must be referred to as seen in Fig. A.12. For a discrete system to be stable, the poles must reside within the *unit circle*. In practice, there could also be architecture-specific stability limits, such as the so called *overload limit* associated with the traditional CP-PLL architecture. As a conservative approach all the stability limits, namely z-plane, s-plane and architecture-specific, must be evaluated (See chapter 5).

<sup>&</sup>lt;sup>3</sup>This is only true to the extend that the continuous approach is valid; in reality a PLL is a sampled or discrete system with a time varying nature.



Figure A.12: Theoretical corresponding stability regions on s- and z-planes (left, filled area) and pole locations on z-plane with expected behaviors associated (right).

#### A.2.2 Design Example

The design of a PLL typically involves determining the type of loop required (e.g. type-II, second-order), selecting the proper bandwidth depending on jitter specifications relating to the reference and the VCO which are the two dominant jitter contributors, and establishing the desired stability. Let us assume a system to have the following specifications:

- 1. Output frequency adjustable between 2MHz to 3MHz
- 2. Frequency steps of 100KHz
- 3. Phase coherent frequency output
- 4. Lock-up time of 1ms between channels
- 5. Overshoot of less then 20%

These specifications [106] can characterize a system function similar to a variable time base generator or a frequency synthesizer with a lot of practical use. From the given specifications, the circuit parameters can now be determined. The forward and feedback transfer functions are given by

$$G(s) = K_p K_f K_o \tag{A.2.39}$$

$$H(s) = K_n \tag{A.2.40}$$

where  $K_n = 1/N$ . The programmable counter divide ratio  $K_n$  can be found from  $f_o = N f_i$ , thus:

$$N_{min} = \frac{f_{0min}}{f_i} = \frac{f_{0min}}{f_{step}} = \frac{2MHz}{100kHz} = 20$$
(A.2.41)

$$N_{max} = \frac{f_{0max}}{f_{step}} = \frac{3MHz}{100kHz} = 30 \tag{A.2.42}$$

$$K_n = \frac{1}{20} \to \frac{1}{30}$$
 (A.2.43)

A type-II system is required to produce a phase coherent output relative to the input. The root locus contour is shown in Fig. A.9 and the system step response is illustrated in Fig. A.10.

For the numerical calculations, let us assume that the VCO frequency change per control voltage is 10 Mrad/s/V, thus the VCO gain of  $K_o = \frac{10 \cdot 10^6}{s}$  rad/s/V. The *s* in the denominator converts the frequency characteristics of the VCO to phase, that is, phase is the integral of frequency. Similarly let us assume the gain constant for the phase detector is 0.1 V/rad.

The parameters thus far determined include  $K_p$ ,  $K_o$ ,  $K_n$  leaving only  $K_f$  as the variable for design. Since a type-II system is required for a phase coherent output, writing the loop transfer function and relating it to Eq. A.2.30 yields:

$$G(s)H(s) = \frac{K(s+a)}{s^2} = \frac{K_p K_v K_n K_f}{s}$$
(A.2.44)

Thus,  $K_f$  must take the form

$$K_f = \frac{s+a}{s} \tag{A.2.45}$$



Figure A.13: Active filter required.

in order to provide all of the necessary poles and zeroes for the required G(s)H(s). The circuit shown in Fig. A.13 yields the desired results<sup>4</sup>.  $K_f$  is expressed by

$$K_f = \frac{R_2 C s + 1}{R_1 C s}$$
(A.2.46)

where A is voltage gain of the amplifier, which is assumed to be high enough for the equation to hold.  $R_1$ ,  $R_2$  and C are then the variables used to establish the overall loop characteristics. These parameters relate to  $\omega_n$  and  $\xi$  as:

$$\omega_n^2 = \frac{K_p K_v}{R_1 C N} \tag{A.2.47}$$

$$2\xi\omega_n = \frac{K_p K_v R_2}{R_1 N} \tag{A.2.48}$$

The percent overshoot and settling time are now used to determine  $\omega_n$ . From Fig. A.10, it is seen that a damping ratio of 0.8 will produce a peak overshoot less than 20% and will settle within 5% at  $\omega_n t = 4.5$ . The required lock-up time is chosen as 1ms in the specifications. Therefore,

$$\omega_n = \frac{4.5}{t} = \frac{4.5}{0.001} = 4.5 \, krad/s \tag{A.2.49}$$

$$R_1 C = \frac{0.5 K_p K_v}{\omega_n^2 N} = 0.00102 \,s \tag{A.2.50}$$

can be calculated. Letting C=0.5 $\mu$ F, then  $R_1$  is found to be approximately equal to 2k $\Omega$ .  $R_2$  is calculated as

$$R_2 = \frac{2\xi}{C\omega_n} = 710\,\Omega\tag{A.2.51}$$

which finalizes the parameter calculation for the loop.

<sup>&</sup>lt;sup>4</sup>This result also shows why a charge-pump (CP) design is desirable as it achieves the same functionality with a passive filter design, obviating the active gain stage which is inevitable in non-CP designs.


Figure A.14: Root locus variation.



Figure A.15: VCO control signal transient response.

Since the loop gain is a function of the divide ratio  $K_n$ , the closed loop poles will vary their positions as  $K_n$  varies. The root locus shown in Fig. A.14 illustrates the closed loop pole variation [106]. The loop is designed for the programmable counter N=30, thus its response for N=20 exhibits a wider bandwidth and a larger damping factor, resulting in reducing both lock-up time and percent overshoot.

The frequency of the VCO is a function of its control voltage, therefore the system behavior can be monitored by directly probing the output of the loop filter. Fig. A.15 shows a measurement result of the design [106]. The behavior for the two cases where N is equal either to 20 or 30, is coherent with the expectations extracted from the root locus variation.

Page intentionally left blank.

## Appendix B

# Methods for Hand Calculations and Model Based Simulations

The hardware description languages (HDLs) have been developed for simulating, synthesizing and documenting hardware. Most commonly used HDLs are Verilog-HDL [4] and VHDL [8] together with their analog and mixed-signal extensions.

Verilog-A [3] is an analog HDL patterned after Verilog-HDL. Verilog-AMS [2] combines Verilog-HDL and Verilog-A into a mixed-signal (MS) HDL which is a super-set of both seed languages. Verilog-HDL provides event-driven modeling constructs whereas Verilog-A provides continuous-time modeling constructs. By combining the two, it becomes possible to write efficient mixed-signal behavioral models.

In this chapter, all the verilog/verilogA model cores as well as cores of CaP-PeLLo<sup>1</sup> which is a tool developed for parametrizing and evaluating high level behavior of the CP-PLL presented in Chapter 5 are provided. Verilog models presented in this chapter correspond to what is actually simulated behaviorally to acquire the results given within Chapter 5. Cores of C/C++ sources and Octave scripts which are used for actual parametrization provide the full CaPPeLLo functionality.

<sup>&</sup>lt;sup>1</sup>CaPPeLLo stands for CP-PLL parametrizer, developed in C/C++ programming and Octave scripting languages in the framework of this thesis.

### B.1 Model Cores

Considering CP-PLL simulations in verilog, three different simulators are used to cross-check different modeling implementations. These are iVerilog [9] which is an open source verilog simulator, verilog-HDL and verilog-A within Cadence [17] environment. However, because of the space constraints, only verilog-A models will be presented. Unless otherwise stated all the models are in verilog-A and are simulated for behavioral verification but not for hardware synthesis. An iVerilog CP-PLL model can be found in [16]. The model provided in this section corresponds to the CP-PLL test bench seen in Fig. 5.20.

#### B.1.1 Reference Clock Generator

Reference signal of 40MHz for the PLL is generated via dividing a 4.8GHz clock by 120. Listing B.1 shows a verilog model used for the simulation where verilog **defines** N and *LHCfreq* are the divide ratio of 120 and the reference frequency of 40 MHz, respectively. The output, *ClkH*, is inverted continuously with a frequency equal to 2/period. The variable, *step*, is used to apply a phase step at a certain point in simulation. Values in this module are set at start-up and never modified during the simulation.

Listing B.1: Model for high frequency clock generator.

```
module ClkHighFreq (ClkH);
                                                                      1
         output ClkH;
                                                                      2
         reg ClkH;
                                                                      3
         real period, step;
                                                                      4
          initial begin
                                                                      5
                   step
                           = 0;
                                                                      6
                           = 0;
                   ClkH
                                                                      7
                   period = 1.0/( 'LHCfreq*'N);
                                                                      8
         end
                                                                      9
         always ClkH = \#((\text{step+period})/2) ~ClkH;
                                                                      10
endmodule
                                                                      11
```

Reference jitter, either sinusoidal or white, is introduced at divide-by-120 stage. Listing B.2 shows white jittered %120. At every time the input signal, *in*, crosses a threshold representing a rising edge transition (line 3) a random timing error is calculated (line 8) and introduced to the output, *out* (line 10). Jitter is introduced as a random variation, dt, in slew rate of rising edge (lines 8 and 10). Sinusoidal jitter is also introduced in a similar way.

#### Listing B.2: Model for %120 with white jitter.

```
analog begin
                                                                  1
    @(initial_step) seed = -311;
                                                                  2
    @(cross(V(in) - vth, dir, ttol)) begin
                                                                  3
         count = count + 1;
                                                                  4
         if (count >= ratio)
                                                                  5
             count = 0;
                                                                  6
         n = (2 * count >= ratio);
                                                                  7
         dt = jitter * rdist_normal(seed, 0, 1);
                                                                  8
    end
                                                                  9
    V(out) \iff transition(n ? vh : vl, td+dt, tt);
                                                                  10
end
                                                                  11
```

#### **B.1.2** Phase/Frequency Detector

The brief operation is as follows: at every rising edge of the reference clock for the PLL, up is set and at every rising edge of the clock which is locally generated, that is, the output of the feedback divider, down is set. Every time the condition of up=down=1 holds, a *reset* signal is generated and at each rising edge of this *reset* signal, both up and down are reset.

In the explicit module shown in Listing B.3, *vin\_if* and *vin\_lo* are the two input signals to be compared whereas *sigout\_inc* and *sigout\_dec* correspond to the two signals *up* and *down*, respectively (lines 6 and 7). After the declarations, analog part starts (line 22). Every time input signals perform a transition, a flag is set (lines 29-30 and 32-33). Depending on the phase difference between the two inputs, **December 17, 2007** 

state is set to same, behind or ahead (lines 34-48). Output signals up and down or correspondingly sigout\_inc and sigout\_dec are calculated accordingly (lines 49-60). Finally, the PFD outputs are assigned taking rising/falling times into account (lines 61-64).

Listing B.3: Model for phase/frequency detector.

| 'include "discipline.h"                                       | 1  |
|---------------------------------------------------------------|----|
| 'include "constants.h"                                        | 2  |
| 'define behind 0                                              | 3  |
| 'define same 1                                                | 4  |
| 'define ahead 2                                               | 5  |
| module freq_ph_detector(vin_if, vin_lo,                       | 6  |
| <pre>sigout_inc , sigout_dec);</pre>                          | 7  |
| <pre>input vin_if , vin_lo;</pre>                             | 8  |
| <pre>output sigout_inc , sigout_dec;</pre>                    | 9  |
| <pre>electrical vin_if, vin_lo, sigout_inc, sigout_dec;</pre> | 10 |
| <b>parameter real</b> $vlogic_high = 1.2;$                    | 11 |
| <b>parameter real</b> $vlogic_low = 0;$                       | 12 |
| <b>parameter real</b> vtrans = $0.6$ ;                        | 13 |
| <b>parameter real</b> $tdel = 0$ from $[0:inf);$              | 14 |
| <b>parameter real</b> trise = $1n$ from $(0:inf);$            | 15 |
| <b>parameter real</b> $tfall = 1n$ from $(0:inf);$            | 16 |
| <pre>integer tpos_on_if;</pre>                                | 17 |
| <pre>integer tpos_on_lo;</pre>                                | 18 |
| real sigout_inc_val;                                          | 19 |
| real sigout_dec_val;                                          | 20 |
| <pre>integer state;</pre>                                     | 21 |
| analog <b>begin</b>                                           | 22 |
| @ ( initial_step ) <b>begin</b>                               | 23 |
| $sigout_inc_val = 0;$                                         | 24 |
| $sigout_dec_val = 0;$                                         | 25 |
| state = 'same;                                                | 26 |

| end                                                                                                                                | 27 |
|------------------------------------------------------------------------------------------------------------------------------------|----|
| $t pos_on_i f = 0;$                                                                                                                | 28 |
| $@ ( \ \operatorname{cross}\left( V( \ \operatorname{vin}_{-} \operatorname{if} \right) \ - \ \operatorname{vtrans} \ , \ +1) \ )$ | 29 |
| $t pos_on_i f = 1;$                                                                                                                | 30 |
| $tpos_on_lo = 0;$                                                                                                                  | 31 |
| $@ ( cross(V(vin_lo) - vtrans, +1) )$                                                                                              | 32 |
| $t pos_on_lo = 1;$                                                                                                                 | 33 |
| if (tpos_on_if && tpos_on_lo) begin                                                                                                | 34 |
| state = 'same;                                                                                                                     | 35 |
| end else if (tpos_on_if) begin                                                                                                     | 36 |
| if (state == 'behind) begin                                                                                                        | 37 |
| <pre>state = 'same;</pre>                                                                                                          | 38 |
| end else if $(state = `same)$ begin                                                                                                | 39 |
| state = 'ahead;                                                                                                                    | 40 |
| $\mathbf{end}$                                                                                                                     | 41 |
| end else if (tpos_on_lo) begin                                                                                                     | 42 |
| if (state == 'ahead) begin                                                                                                         | 43 |
| <pre>state = 'same;</pre>                                                                                                          | 44 |
| end else if $(state = `same)$ begin                                                                                                | 45 |
| <pre>state = 'behind;</pre>                                                                                                        | 46 |
| end                                                                                                                                | 47 |
| end                                                                                                                                | 48 |
| if (tpos_on_if    tpos_on_lo) begin                                                                                                | 49 |
| if (state == 'ahead) begin                                                                                                         | 50 |
| sigout_inc_val = vlogic_high;                                                                                                      | 51 |
| sigout_dec_val = vlogic_low;                                                                                                       | 52 |
| end else if $(state = `same)$ begin                                                                                                | 53 |
| sigout_inc_val = vlogic_low;                                                                                                       | 54 |
| sigout_dec_val = vlogic_low;                                                                                                       | 55 |
| end else if (state == 'behind) begin                                                                                               | 56 |
| <pre>sigout_inc_val = vlogic_low;</pre>                                                                                            | 57 |

```
sigout_dec_val = vlogic_high;
                                                                  58
       end
                                                                  59
     end
                                                                 60
      V(sigout_inc) <+ transition(sigout_inc_val, tdel,
                                                                  61
                                      trise , tfall);
                                                                  62
      V(sigout_dec) <+ transition(sigout_dec_val, tdel,
                                                                  63
                                      trise , tfall);
                                                                  64
  end
                                                                  65
endmodule
                                                                  66
```

### B.1.3 Charge-Pump

The analog model given Listing B.4 has the following signature: *module charge\_pump* (*siginc, sigdec, vout, vsrc*); where vout (vsrc) is the output terminal from which charge is pumped/sucked (sourced/sunk), and *siginc* and *sigdec* are logic signals which control charge pump operation.

Listing B.4: Model for charge pump.

```
analog begin
                                                              1
 @ ( initial_step ) begin
                                                              2
    iout_val = iamp*i_mult(V(siginc), V(sigdec), vtrans);
                                                             3
 end
                                                              4
 @ (cross(V(siginc) - vtrans, 0)) begin
                                                              5
    iout_val = iamp*i_mult(V(siginc), V(sigdec), vtrans);
                                                             6
 end
                                                              7
 @ (cross(V(sigdec) - vtrans, 0)) begin
                                                              8
    iout_val = iamp*i_mult(V(siginc), V(sigdec), vtrans);
                                                             9
 end
                                                              10
  I(vsrc, vout) <+ transition(iout_val,tdel, trise,tfall); 11
end
                                                              12
```

The function  $i\_mult$  is a current multiplier which returns direction (-1, 0 or 1) to which the charge should be pumped. Listing B.5 shows the explicit  $i\_mult$  function.

| Listing B.5: Function $i_{-}mult$ used within carge-pump model. |    |
|-----------------------------------------------------------------|----|
| analog function real i_mult;                                    | 1  |
| input inc;                                                      | 2  |
| $\mathbf{input}  \mathrm{dec};$                                 | 3  |
| input vtrans;                                                   | 4  |
| real inc;                                                       | 5  |
| real dec;                                                       | 6  |
| real vtrans;                                                    | 7  |
| integer inc_high;                                               | 8  |
| integer dec_high;                                               | 9  |
| $\operatorname{begin}$                                          | 10 |
| $inc_high = inc > vtrans;$                                      | 11 |
| $dec_high = dec > vtrans;$                                      | 12 |
| $i_mult = 0.0;$                                                 | 13 |
| $if (inc_high == dec_high) begin$                               | 14 |
| $i_mult = 0.0;$                                                 | 15 |
| end else if (inc_high) begin                                    | 16 |
| $i_mult = 1.0;$                                                 | 17 |
| end else if (dec_high) begin                                    | 18 |
| $i_{-}mult = -1.0;$                                             | 19 |
| $\mathbf{end}$                                                  | 20 |
| end                                                             | 21 |
| endfunction                                                     | 22 |

### B.1.4 White Jittered Voltage Controlled Oscillator

Listing B.6 shows the white jittered voltage controlled oscillator model which does not have any process corner and/or duty cycle error parameters. This model is used to investigate the effect of white jitter at the VCO stage but not that of process December 17, 2007 corners. VCO output frequency is calculated (lines 3-4) based on the input voltage level with a fixed VCO gain parameter. Lines 5 and 6 limit the oscillation frequency for a more secure numerical operation. Line 7 adds random timing ambiguity. At every cycle, the numerical value for random jitter is re-calculated (lines 10-14) and introduced to the output at line 15.

Listing B.6: White jittered local voltage controlled oscillator model.

analog begin 1  $@(initial_step)$  seed = -561; 2 freq = (V(in) - vmin) \* (fmax - fmin) /3 (vmax - vmin) + fmin;4 if (freq > fmax) freq = fmax;5 if (freq < fmin) freq = fmin;6 freq = freq / (1 + dT \* freq); $\overline{7}$ \$bound\_step(0.6/freq); 8 phase =  $2*'M_PI*idtmod(freq, 0.0, 1.0, -0.5);$ 9 @( cross(phase + 'M\_PI/2, +1, ttol) or 10  $cross(phase - 'M_PI/2, +1, ttol))$  begin 11  $dT = M_SQRT2*jitter**rdist_normal(seed, 0, 1);$ 12 $n = (phase > = -M_PI/2) \&\& (phase < M_PI/2);$ 13 end 14  $V(out) \iff transition(n ? vh : vl, 0, tt);$ 15end 16

To investigate the effect of different process corner parameters extracted from the device level simulations of VCO and D2S stages, a more involved model must be used. Listing B.7 shows parts of iverilog implementation of the VCO including all the process corner and duty cycle error parameters.

The model gets *Corner* and *Vctrl* which stand for the index of the corners shown in Table 5.3 and VCO control voltage in  $\mu V$ , respectively, as parameters and outputs *Period*, the period of output clock signal (line 2). Every time corner or control voltage changes (line 4) boundary limits are checked (lines 6-10 and lines 12-December 17, 2007 19), since the extracted parameters do not cover full dynamic range (e.g. for control voltage of the VCO, not 0-Vdd but 0.35mV- 0.70mV). Control curve (lines 21-30) and duty-cycle error (line 31) parameters which are extracted from device level simulations are assigned for each corner. Finally, the output period is calculated (lines 37-49) and assigned (line 50). Three dots (...) represent skipped code.

Listing B.7: Local voltage controlled oscillator model including process corner and duty cycle error parameters.

```
. . .
                                                                    1
module curve (Corner, Vctrl, Period);
                                                                    2
. . .
                                                                    3
always @(Corner or Vctrl) begin
                                                                    \mathbf{4}
  vctrl = Vctrl / 1000000.0;
                                                                    5
  if (Corner < 0 \mid \mid Corner > 14) begin
                                                                    6
    display("ERROR_: 0 < Corner_< 14");
                                                                    7
    $display("But_it_is_%0d\n", Corner);
                                                                    8
    $finish;
                                                                    9
  end
                                                                    10
  if (Corner == 0) begin
                                                                    11
    if (vctrl < range4corner0_first || vctrl >
                                                                    12
                                range4corner0_last) begin
                                                                    13
       display("\nERROR_: ...\%0f mV_<...Vctrl_<...\%0f mV",
                                                                    14
                                    range4corner0_first,
                                                                    15
                                    range4corner0_last);
                                                                    16
       $display("for_Corner=%0d,_but_it_is_%0d_uV\n",
                                                                    17
                                          Corner, Vctrl);
                                                                    18
       $finish;
                                                                    19
    end
                                                                    20
    a0 = -1.64755e + 12;
                                                                    21
    a1 = 1.63633e + 13;
                                                                    22
    a2 = -6.09447e + 13;
                                                                    23
    a3 = 9.30479e + 13;
                                                                    24
                                                    December 17, 2007
```

```
a4 = 1.0;
                                                                25
    a5 = -1.78625e + 14;
                                                                26
    a6 = 2.08998e + 14;
                                                                27
    a7 = -7.78976e + 13;
                                                                28
    a8 = 1.0;
                                                                29
    a9 = 1.0;
                                                                30
    dutyCycleError=dutyCycleError0;
                                                                31
  end
                                                                32
  if (Corner == 1) begin
                                                                33
. . .
                                                                34
    dutyCycleError=dutyCycleError14;
                                                                35
  end
                                                                36
  freq = (a0 +
                                                                37
    a1 * vctrl +
                                                                38
    a2 * vctrl*vctrl +
                                                                39
    a3 * vctrl*vctrl*vctrl +
                                                                40
    a4 * vctrl*vctrl*vctrl +
                                                                41
    a5 * vctrl*vctrl*vctrl*vctrl +
                                                                42
    a6 * vctrl*vctrl*vctrl*vctrl*vctrl+vctrl
                                                                43
    a7 * vctrl*vctrl*vctrl*vctrl*vctrl*vctrl+vctrl +
                                                                \overline{44}
    a8 * vctrl*vctrl*vctrl*vctrl*vctrl*vctrl*
                                                                45
                                                 vctrl +
                                                                46
    a9 * vctrl*vctrl*vctrl*vctrl*vctrl*vctrl*
                                                                47
                                          vctrl*vctrl);
                                                                48
  tmp = (10000000000000.0 / freq) * 1'b1;
                                                                49
  Period \leq tmp;
                                                                50
end
                                                                51
endmodule
                                                                52
```

#### B.1.5 Probes

CP-PLL model employs various types of probe modules to dump data onto local disk for further investigation. Listing B.8 shows probe model calculating relative phase error (out) between two input signals (in0 and in1). After declarations and initial settings (first 25 lines) like opening a local file, transition instances of both the input signals are stored (lines 27-28), phase differences are calculated and compared (lines 29-44). Output of the module is assigned at line 45 and the same value is dumped to the local file only when the loop is in locked state (e.g.  $5\mu s$  in the module, line 46).

Listing B.8: Probe module calculating relative phase error.

| module probe.     | _A (in0, in1, out);           |                                    | 1  |
|-------------------|-------------------------------|------------------------------------|----|
| parameter         | real thresh $= 0.0;$          |                                    | 2  |
| parameter         | integer dir = 1 from          | [-1:1] exclude 0;                  | 3  |
| input in 0,       | in1;                          |                                    | 4  |
| output out        | ;                             |                                    | 5  |
| voltage in(       | 0, in1, out;                  |                                    | 6  |
| real $t0$ , $t1$  | $1, out_val;$                 |                                    | 7  |
| integer pro       | obe_A;                        |                                    | 8  |
| integer wa        | ${ m aitForReferenceRiseIns}$ | tant;                              | 9  |
| real sig          | gnalRiseInstant, refer        | cenceRiseInstant;                  | 10 |
| real sig          | gnalFallInstant, refer        | renceFallInstant;                  | 11 |
| real sig          | gnalPhase, signalPhase        | Tmp;                               | 12 |
| real ph           | aseErrorLimit;                |                                    | 13 |
| analog <b>beg</b> | in                            |                                    | 14 |
| @(initia]         | $l_step$ ) <b>begin</b>       |                                    | 15 |
| probe_A           | A = fopen("/home/oc/          | <pre>pllDat/A/probe_A.dat");</pre> | 16 |
| signall           | RiseInstant                   | = 0;                               | 17 |
| referen           | nceRiseInstant                | = 0;                               | 18 |
| signal            | FallInstant                   | = 0;                               | 19 |
| referen           | nceFallInstant                | = 0;                               | 20 |
|                   |                               |                                    |    |

| signalPhase                                | = 0;                                  | 21 |
|--------------------------------------------|---------------------------------------|----|
| signalPhaseTmp                             | = 0;                                  | 22 |
| waitForReferenceRiseInstant =              | = 0;                                  | 23 |
| phaseErrorLimit =                          | = 'upLimit;                           | 24 |
| end                                        |                                       | 25 |
| <pre>@(final_step) \$fclose(probe_A)</pre> | ;                                     | 26 |
| $t0 = last_crossing(V(in0) - th)$          | resh, dir);                           | 27 |
| $t1 = last_crossing(V(in1) - th)$          | resh, dir);                           | 28 |
| @(cross(V(in0) - thresh, dir))             | begin                                 | 29 |
| signalRiseInstant = t0;                    |                                       | 30 |
| signalPhaseTmp = t0 - reference            | <pre>nceRiseInstant;</pre>            | 31 |
| if (signalPhaseTmp <= phaseE               | ErrorLimit) <b>begin</b>              | 32 |
| signalPhase = signalPhaseTn                | np;                                   | 33 |
| waitForReferenceRiseInstant                | t = 0;                                | 34 |
| end else                                   | · · · · · · · · · · · · · · · · · · · | 35 |
| waitForReferenceRiseInsta                  | ant $= 1;$                            | 36 |
| end                                        |                                       | 37 |
| @(cross(V(in1) - thresh, dir))             | begin                                 | 38 |
| referenceRiseInstant = t1;                 |                                       | 39 |
| ${f if}$ (waitForReferenceRiseInsta        | ant) begin                            | 40 |
| <pre>signalPhase = signalRiseIns</pre>     | stant $-$ t1;                         | 41 |
| waitForReferenceRiseInstant                | t = 0;                                | 42 |
| end                                        |                                       | 43 |
| end                                        |                                       | 44 |
| V(out) <+ transition(signalPha             | se);                                  | 45 |
| if $(t0 > 5u)$ <b>\$fstrobe</b> (probe_A,  | <pre>signalPhase);</pre>              | 46 |
| end                                        |                                       | 47 |
| endmodule                                  |                                       | 48 |

Another type of probe is given in Listing B.9 which calculates the instantaneous periods of the input clock signal. Every time input signal voltage level crosses December 17, 2007 through a certain threshold (0.6V in the module), latest transition instance is stored (line 11) and compared to the previous one. Since rising transitions can not be distinguished among themselves, for models simplicity, difference between the transition instances can be thought as

$$Period = Difference\% Ideal$$
(B.1.1)

where *Period* is the output, *Difference* is the time duration between the last two rising transitions and *Ideal* is the expected target period of the input signal or similarly, a window can be put around the expected period of the input signal, thus the lines starting from 12 to 19.

Listing B.9: Period meter probe module calculating instantaneous periods of the input clock.

| analog <b>begin</b>                                         | 1    |
|-------------------------------------------------------------|------|
| @ ( initial_step ) <b>begin</b>                             | 2    |
| t e a r l y = 0;                                            | 3    |
| t l a t e s t = 1.0;                                        | 4    |
| tp = 0.6;                                                   | 5    |
| counter $=0;$                                               | 6    |
| <pre>periodMeter = \$fopen("/home/oc/pllDat/pm.dat");</pre> | 7    |
| end                                                         | 8    |
| <pre>@ ( final_step ) \$fclose(periodMeter);</pre>          | 9    |
| @ ( cross $((V(in)-tp),+1))$ begin                          | 10   |
| tlatest = \$abstime;                                        | 11   |
| if ( $100p < tlatest-tearly$ &&                             | 12   |
| tlatest-tearly < 300p )                                     | 13   |
| $pout_val = tlatest-tearly;$                                | 14   |
| tearly = tlatest;                                           | 15   |
| counter = counter $+1;$                                     | 16   |
| end                                                         | 17   |
| if (counter = 1)                                            | 18   |
| FF(out) <+ 0.0;                                             | 19   |
| December 17,                                                | 2007 |

|     | else                                      | 20 |
|-----|-------------------------------------------|----|
|     | <pre>FF(out) &lt;+ pout_val;</pre>        | 21 |
|     | if (\$abstime > 1u)                       | 22 |
|     | <b>\$fstrobe</b> (periodMeter, pout_val); | 23 |
| end |                                           | 24 |

### **B.2** Source Cores

CaPPeLLo is a hybrid application consisting of C/C++ sources and Octave scripts. The core given in this subsection corresponds to first order hand calculations while parametrizing the loop dynamics. The output of this code is used within Octave scripts.

In the first step, the model parameters which will be taken into account during the calculation are defined as in Listing B.10.

|       | Listing B.10: Model parameters in use. |   |
|-------|----------------------------------------|---|
| float | PI = 3.141592653589793;                | 1 |
| float | C1, C3, C3min, C3max, R, Tau, Rmax;    | 2 |
| float | Wn, Wi, Wz, Ksi, Icp, Bl, Kvco, Ko;    | 3 |
| float | K, proportionalTerm, integralTerm;     | 4 |
| bool  | isRmaxOk, isStable;                    | 5 |
| int   | N;                                     | 6 |

The second step is to acquire the input parameters from command line in the form of switches and make proper unit conversion suitable for calculation as seen in Listing B.11. This is done via going through the argv vector and evaluating each string member. The counter, optCtr, holds the current index number.

Listing B.11: Decoding input switches.

```
int SetParameters(int argc, char **argv) {
    if (argc <= 1) {
        PrintUsageMessage(argv[0]);
        3</pre>
```

| exit(0); }                                                                                                   | 4  |
|--------------------------------------------------------------------------------------------------------------|----|
| int $optCtr = 1;$                                                                                            | 5  |
| if $(\operatorname{strcmp}(\operatorname{argv}[1], "-?") = 0   $                                             | 6  |
| $\operatorname{strcmp}(\operatorname{argv}[1], "-\operatorname{help"}) = 0   $                               | 7  |
| $\operatorname{strcmp}(\operatorname{argv}[1], "\operatorname{help"}) == 0 ) \{$                             | 8  |
| PrintUsageMessage(argv[0]);                                                                                  | 9  |
| exit(0); }                                                                                                   | 10 |
| while (optCtr != argc) {                                                                                     | 11 |
| <b>if</b> (0 == strcmp(argv[optCtr], "-N")) {                                                                | 12 |
| optCtr++;                                                                                                    | 13 |
| <pre>sscanf(argv[optCtr], "%d", &amp;N);</pre>                                                               | 14 |
| optCtr++;                                                                                                    | 15 |
| <pre>} else if (0 == strcmp(argv[optCtr], "-Wn")) {</pre>                                                    | 16 |
|                                                                                                              | 17 |
| } else if $(0 = \operatorname{strcmp}(\operatorname{argv}[\operatorname{optCtr}], "-\operatorname{Icp"}))$ { | 18 |
| optCtr++;                                                                                                    | 19 |
| sscanf(argv[optCtr], "%f", &Icp);                                                                            | 20 |
| optCtr++;                                                                                                    | 21 |
| $\}$ else {                                                                                                  | 22 |
| PrintDUT(argv[0]);                                                                                           | 23 |
| PrintUsageMessage(argv[0]);                                                                                  | 24 |
| exit(1);                                                                                                     | 25 |
| }}                                                                                                           | 26 |
| return $0;$ }                                                                                                | 27 |
|                                                                                                              |    |

As the last step, calculation can be performed and an output can be generated, since all the needed input parameters are now available, as shown in Listing B.12.

| Listing B.12: | Loop | $\operatorname{parameter}$ | calcu | lation. |
|---------------|------|----------------------------|-------|---------|
|---------------|------|----------------------------|-------|---------|

```
int Calculate() { 1
C1=(Ko*Icp)/(2*PI*Wn*N); 2
C3max=0.1*C1; 3
```

C3min = 0.02 \* C1;4 C3 = (C3min + C3max) / 2.0;5Tau = (2 \* Ksi) / Wn; R = Tau / C1;6 Wz=2\*PI\*(1.0/(R\*C1));7 K = (Ko\*Icp\*R)/(2\*PI\*N);8  $\operatorname{Rmax}=(2*\operatorname{PI}*\operatorname{Wi}*\operatorname{N})/(\operatorname{Ko}*\operatorname{Icp});$ 9 Bl = ((Wn/2.0) \* (Ksi + 1.0/(4.0 \* Ksi)));10 proportionalTerm=Icp\*R; 11 integralTerm=Icp/(C3\*Wi); 12if (R<=Rmax) isRmaxOk=true; 13 **else** isRmaxOk=false; 14 if (K\*Tau<Wi\*Tau) isStable=true; 15**else** isStable=false; 16 return 0;} 17

Finally, a possible *main* function calling the above functions within a procedural approach is given in Listing B.13 where PrintDUT(argv[0]), Output() and some other insignificant functions are not listed.

Listing B.13: A possible *main* function.

```
int main(int argc, char **argv) {
    SetParameters(argc, argv);
    Calculate();
    PrintDUT(argv[0]);
    Output();
    return 0;}
    6
```

## **B.3** Script Cores

CaPPeLLo is a hybrid application consisting of C/C++ and Octave. The Octave core given in this subsection corresponds to the highest level behavioral model in the form of transfer function which is simulated at the beginning of the project.

215

#### **B.3.1** Parametrization

The CP-PLL parameters which are calculated by CaPPeLLo are introduced (Listing B.14) by means of a *condition* variable which can have 16 different values corresponding to the operating points seen in Fig. 5.18.

Listing B.14: Introducing corner parameters.

```
clear
                                                                  1
condition = 0;
                                                                  2
if (condition = 0)
                                                                  3
# CaPPeLLo -Kvco 35.0e9 -N 120 -Wi 40.0e6
                                                                  4
#
            -Ksi 4.67 -Wn 500.0e3 -Icp 5.0e-6
                                                                  5
            = 3.141593 \,\mathrm{e6};
                              Ν
                                     = 120.00000;
    wn
                                                                  6
    Ksi
            = 4.670000;
                              Ko
                                    = 219.911484e9;
                                                                  \overline{7}
            = 5.000000 e - 6; C
                                    = 147.760040 e - 12;
    Icp
                                                                  8
            = 20120.556641; K
    R
                                 = 29342478.000000;
                                                                 9
    Tau
            = 2.973014 e - 6; KTau2 = 87.235606;
                                                                  10
    WiTau2 = 747.200016;
                                                                  11
endif
                                                                  12
. . .
                                                                  13
if (condition = 15)
                                                                  14
# CaPPeLLo -Kvco 35.0e9 -N 120 -Wi 40.0e6
                                                                  15
            -Ksi 4.67 -Wn 2000.0e3 -Icp 20.0e-6
#
                                                                  16
            = 12.566371 \,\mathrm{e6}; N
                                   = 120.000000;
    wn
                                                                  17
                              Ko
    Ksi
            = 4.670000;
                                    = 219.911484e9;
                                                                  18
    Icp
            = 19.999999 e - 6; C
                                   = 36.940010 e - 12;
                                                                  19
    R
            = 20120.556641; K = 117369912.000000;
                                                                  20
    Tau
            = 0.743254e-6; KTau2 = 87.235606;
                                                                  21
    WiTau2 = 186.800004;
                                                                  22
endif
                                                                  23
```

After selecting a condition, the loop transfer function for this condition is constructed based on the entered parameters, the system is checked whether it is a December 17, 2007 controllable system and finally it is simulated to produce behavioral plots, namely impulse and step responses together with Bode plots and root locus curve on s-plane as seen in Listing B.15.

Listing B.15: Constructing loop transfer function and generating behavioral plots.

| $num = [(R*C*wn^2) (wn^2)];$                                       | 1  |
|--------------------------------------------------------------------|----|
| den = $[(1/N) (2*Ksi*wn/N) (wn*wn/N)];$                            | 2  |
| T = tf(num, den, 0, "ClkLHC", "ClkPLL/N");                         | 3  |
| sysout(T)                                                          | 4  |
| $\operatorname{damp}(\mathrm{T})$                                  | 5  |
| is_observable (T)                                                  | 6  |
| is_controllable (T)                                                | 7  |
| is_stabilizable (T)                                                | 8  |
| is_detectable (T)                                                  | 9  |
| is_stable (T)                                                      | 10 |
| wrange = $\log \text{space}(\log 10(0.1), \log 10(10^{10}), 100);$ | 11 |
| impulse(T, 1, 0.2*10^-6, 1000); figure;                            | 12 |
| step(T, 1, 0.2*10^-6, 1000); figure;                               | 13 |
| bode(T, wrange); figure;                                           | 14 |
| rlocus(T, 0.001, 0.0, 1.0); figure;                                | 15 |

Then the same step as previous is repeated for noise transfer functions, namely the two dominant ones, reference-to-out and VCO-to-out as in Listing B.16.

Listing B.16: Constructing noise transfer functions and generating behavioral plots.

| den = $[1 (K) ((Ko*Icp)/(2*pi*N*C))];$     | 9  |
|--------------------------------------------|----|
| $Tvco2out = tf(num, den, 0, "VCO_Noise",$  | 10 |
| "PLL_Output_Noise");                       | 11 |
| <pre>bode(Tvco2out, wrange); figure;</pre> | 12 |
| sysout(Tvco2out)                           | 13 |
| damp(Tvco2out)                             | 14 |
| pause                                      | 15 |

#### B.3.2 Evaluation

Once the parametrization is complete, it must be verified by numerical simulations within an HDL environment. The jitter performance of the loop should be calculated numerically to confirm the transfer function representation, to the degree possible. The following script cores are used to numerically evaluate the jitter performance of the CP-PLL presented in Chapter 5.

The higher level Octave script reading the single-column data file, which has instantaneous periods of the measured clock signal, produced by verilogA probe modules is given in Listing B.17 where it calculates, starting from an *offset*, the average, the standard deviation, the maximum normalized deviation, absolute (Listing B.18) and cycle-to-cycle (Listing B.19) jitter metrics.

Listing B.17: Calculating statistics out of instantaneous period values.

| clear;                                                                | 1  |
|-----------------------------------------------------------------------|----|
| <pre>load out/periodMeter.dat;</pre>                                  | 2  |
| offset = $100000;$                                                    | 3  |
| T = Center = mean(periodMeter);                                       | 4  |
| Sigma = std(periodMeter);                                             | 5  |
| $\max dT = \max(\operatorname{abs}(\operatorname{periodMeter}-T))/T;$ | 6  |
| $fprintf("Center_=_%.9g, _1/Center_=_%.9g n",$                        |    |
| ${ m T},1/{ m T});$                                                   | 8  |
| $fprintf("Sigma_abs\_=_%.9g, \_Sigma_rel\_=_%.9g\%\n",$               | 9  |
| Sigma, $100*Sigma/T$ );                                               | 10 |

December 17, 2007

| $f p r i n t f ("Max_dT_=_%.9g\% n", 100*maxdT);$ | 11 |
|---------------------------------------------------|----|
| $fprintf("Absolute_Jiier_(aj) = \%.9g\n",$        | 12 |
| aj(periodMeter, Center, offset));                 | 13 |
| $fprintf("Cycle-to-cycle_Jiier_(cc)=.%.9g\n",$    |    |
| cc(periodMeter, offset));                         | 15 |
| <pre>plot (periodMeter); figure;</pre>            | 16 |
| hist (periodMeter, 200); figure;                  | 17 |
| pause;                                            |    |

Listing B.18: Calculating normalized cycle-to-cycle jitter.

```
function retval = cc(vector, offset)
                                                                   1
    retval = 0;
                                                                   \mathbf{2}
    if (nargin != 2)
                                                                   3
      usage ("cc_(vector,_offset)");
                                                                   4
    endif
                                                                  5
    if (isvector (vector))
                                                                   6
      length = length (vector);
                                                                   7
      i = offset;
                                                                   8
      sum = 0;
                                                                   9
      while (i < length)
                                                                   10
         difference = vector (i+1) - vector (i);
                                                                  11
         sum = sum + difference * difference;
                                                                   12
         i = i + 1;
                                                                   13
      endwhile
                                                                   14
      retval = sqrt(sum)/length;
                                                                   15
    else
                                                                   16
      error ("ERROR_:_cc_::_A_vector_argument_?");
                                                                   17
    endif
                                                                   18
endfunction
                                                                   19
```

Listing B.19: Calculating normalized absolute jitter.

```
function retval = aj(vector, average, offset)
                                                                    1
    retval = 0;
                                                                    2
    if (nargin != 3)
                                                                    3
       usage ("aj_(vector, _average, _offset)");
                                                                    4
    endif
                                                                    \mathbf{5}
    if (isvector (vector))
                                                                    6
       length = length (vector);
                                                                    \overline{7}
       i = offset;
                                                                    8
      sum = 0;
                                                                    9
       while (i < length)
                                                                    10
         product = vector(i) - average;
                                                                    ^{11}
         sum = sum + product * product;
                                                                    12
         i = i + 1;
                                                                    13
       endwhile
                                                                    14
       retval = sqrt(sum)/length;
                                                                    15
    else
                                                                    16
       error ("ERROR_:_aj_:_A_vector_argument_?");
                                                                    17
    endif
                                                                    18
endfunction
                                                                    19
```

Page intentionally left blank.

## List of Figures

| Generic tracking principle                                                                          | 3                           |
|-----------------------------------------------------------------------------------------------------|-----------------------------|
| Time projection histogram of an individual pad                                                      | 4                           |
| A cosmic event                                                                                      | 5                           |
| Process diagram of bottom-up design                                                                 | 10                          |
| Process diagram of standard-cell top-down design                                                    | 10                          |
| $\mathrm{SU}(4)$ multiplets of baryons made of u, d, s, and c quarks. The 20-                       |                             |
| plet with an SU(3) octet (Left). The 20-plet with an SU(3) decuplet                                 |                             |
| (right). Both from PDG2007                                                                          | 24                          |
| Compass physics interest summary                                                                    | 25                          |
| COMPASS detector setup [5] in 2006                                                                  | 27                          |
| The M2 beam line [5]. $\ldots$                                                                      | 28                          |
| Simplified COMPASS DAQ architecture.                                                                | 32                          |
| Huygens principle applied on wave-fronts emitted due to a charged                                   |                             |
| particle traveling at low velocity (left), faster than light (right). From                          |                             |
| Eq. 2.2.                                                                                            | 35                          |
| Number of Cherenkov photons as a function of their wavelengths [18].                                | 36                          |
| RICH detection principle (left) and an on-line data quality monitoring $% \mathcal{A}(\mathcal{A})$ |                             |
| tool $[19]$ $[44]$ displaying an actual Cherenkov ring acquired within a                            |                             |
| test beam at CERN (right)                                                                           | 36                          |
| Architecture of the MAD-4.                                                                          | 39                          |
| Binary read-out architecture of a single channel in the CMAD                                        | 41                          |
| Charge sensitive amplifier, CSA                                                                     | 42                          |
| Architecture of the shaper with BLH                                                                 | 43                          |
|                                                                                                     | Generic tracking principle. |

| 3.5  | Example configuration for the FE                                                | 46 |
|------|---------------------------------------------------------------------------------|----|
| 3.6  | Lumped ac-coupling between the first two stages                                 | 47 |
| 3.7  | Transistor level implementation of the first stage, CSA                         | 49 |
| 3.8  | Transistor level implementation of the slew rate limited buffer, SRLB.          | 51 |
| 3.9  | Class AB biased OTA.                                                            | 52 |
| 3.10 | Transistor level implementation of shaper core amplifier                        | 54 |
| 3.11 | Binary weighted (left) and thermometer coded (right) architectures              |    |
|      | for ideal comparison.                                                           | 56 |
| 3.12 | Comparison results for 10-bits BWA and TCA showing the INLs (left               |    |
|      | column) and DNLs (right column) for both the architectures                      | 57 |
| 3.13 | 10-bits transistor-only R-2R architecture                                       | 58 |
| 3.14 | Layout $(140 \times 620 \mu m^2)$ of 10-bits transistor-only R-2R D/A where the |    |
|      | thick yellow layers on the top and on the bottom show the channel               |    |
|      | boundary.                                                                       | 59 |
| 3.15 | The opamp used as the output current-to-voltage converter in the D/A.           | 60 |
| 3.16 | Layout $(80x240\mu m^2)$ of the opamp used as the output current-to-            |    |
|      | voltage converter in the D/A                                                    | 61 |
| 3.17 | MC full scan.                                                                   | 63 |
| 3.18 | MC simulation result for the MSB transition                                     | 64 |
| 3.19 | Worst case INL and DNL in corner analysis                                       | 65 |
| 3.20 | LDO voltage reference.                                                          | 66 |
| 3.21 | Layout (260x140 $\mu m^2$ ) of LDO voltage reference                            | 67 |
| 3.22 | Implemented opamp-less band-gap reference                                       | 68 |
| 3.23 | Layout $(135 \times 300 \mu m^2)$ of opamp-less band-gap reference              | 68 |
| 3.24 | Implemented D/A biasing scheme                                                  | 69 |
| 3.25 | Layout $(120x320\mu m^2)$ of D/A biasing scheme                                 | 69 |
| 3.26 | Alternative D/A biasing scheme.                                                 | 69 |
| 3.27 | MC simulation result showing the current difference distribution of             |    |
|      | two arbitrary branches of Fig. 3.24; both process variations and de-            |    |
|      | vice mismatches are included                                                    | 71 |
|      | December $17, 20$                                                               | 07 |

| 3.28 | MC simulation result showing the current difference distribution of        |                    |
|------|----------------------------------------------------------------------------|--------------------|
|      | two arbitrary branches of Fig. 3.26; both process variations and de-       |                    |
|      | vice mismatches are included                                               | 72                 |
| 3.29 | Transistor level implementation of the comparator.                         | 73                 |
| 4.1  | Additional grounding/monitoring lines                                      | 75                 |
| 4.2  | Conceptual CMAD test setup                                                 | 76                 |
| 4.3  | The CMAD test setup                                                        | 77                 |
| 4.4  | Measurement results for adjustable gain of the preamplifier as a func-     |                    |
|      | tion of R and C binary D/A converter inputs                                | 78                 |
| 4.5  | Gain linearity of the preamplifier; the measurement and the linear fit     |                    |
|      | (upper plot) and the difference between fit and measurement. $\ . \ . \ .$ | 78                 |
| 4.6  | S-curves (upper) and their derivative (bottom). $\ldots$                   | 80                 |
| 4.7  | Measurement of the channel noise                                           | 81                 |
| 4.8  | A threshold scan measurement                                               | 81                 |
| 4.9  | Measured channel noise of a CMAD channel                                   | 82                 |
| 4.10 | Channel efficiency measurements both for the CMAD and the MAD-4            |                    |
|      | as a function of event rate                                                | 83                 |
| 4.11 | Layout (4.8x3.1 $mm^2$ , with pad-ring) of the CMAD                        | 84                 |
| 5.1  | LHC accelerator stages (not to scale) and main experiments                 | 87                 |
| 5.2  | Bunch structure.                                                           | 89                 |
| 5.3  | Control data frames of broadcast (top) and individually addressed          |                    |
|      | formats (bottom)                                                           | 92                 |
| 5.4  | Time division multiplexed bi-phase-mark encoding                           | 93                 |
| 5.5  | Architecture of GOL                                                        | 95                 |
| 5.6  | GBT based link architecture.                                               | 99                 |
| 5.7  | GBT line format.                                                           | 100                |
| 5.8  | Simplified GBT architecture.                                               | 100                |
| 5.9  | Electrical transmitter architecture.                                       | 101                |
| 5.10 | Broadcast network configuration.                                           | 102                |
| 5.11 | Fan-out network configuration                                              | 103<br>0 <b>07</b> |

| 5.12 Architecture of the serializer                                                                                                              |
|--------------------------------------------------------------------------------------------------------------------------------------------------|
| 5.13 Architecture of the charge-pump PLL                                                                                                         |
| 5.14 Theoretical and practical stability limits                                                                                                  |
| 5.15 Bode plots for continuous-time approximated CP-PLL transfer func-                                                                           |
| tion for a specific parameter set                                                                                                                |
| 5.16 Jitter transfer function from reference to PLL output                                                                                       |
| 5.17 Jitter transfer function from VCO to PLL output                                                                                             |
| 5.18 A subset of selectable operating points                                                                                                     |
| 5.19 A detailed parameter set                                                                                                                    |
| 5.20 Test-bench used for verilog simulations                                                                                                     |
| 5.21 Locking process of a CP-PLL showing the control voltage of the VCO                                                                          |
| (top), the phase error at the inputs of the PFD (middle), the phase                                                                              |
| error at the output of the PLL (bottom), and the simulation time                                                                                 |
| instance (vertical line at $4\mu s$ ) where the jitter statistics start to be                                                                    |
| collected                                                                                                                                        |
| 5.22~ Locking process of a CP-PLL in case of significant reference and VCO                                                                       |
| white noise (See text for details)                                                                                                               |
| 5.23 Instantaneous periods of the clock signals at the inputs of the PFD                                                                         |
| for $\omega_n = 100 kHz$ and $\xi = 1.0$ (top), $\omega_n = 100 kHz$ and $\xi = 0.3$                                                             |
| (bottom). $\ldots \ldots  |
| 5.24 Instantaneous periods of the clock signals at the inputs of the PFD                                                                         |
| for $\xi = 4.67$ and $\omega_n = 0.5$ MHz (top), $\xi = 4.67$ and $\omega_n = 1$ MHz (middle),                                                   |
| $\xi = 4.67$ and $\omega_n = 1.5$ MHz (bottom)                                                                                                   |
| 5.25 Jitter peaking. $\ldots$ 123                                                                                                                |
| 5.26 Introduced sinusoidal jitter (left-top), its histogram form (left-bottom),                                                                  |
| observed PLL output (right-top), and its histogram form (right-bottom).125                                                                       |
| 5.27 Introduced white jitter (left-top), its histogram form (left-bottom),                                                                       |
| observed PLL output (right-top), and its histogram form (right-bottom).126                                                                       |
| 5.28 Top level view of self-biased 3-stage differential ring-type VCO with                                                                       |
| its self-biased single-end converter                                                                                                             |
| December 17, 2007                                                                                                                                |

| 5.29 | Delay cell for self-biased 3-stage ring-type VCO (left) and the biasing             |     |
|------|-------------------------------------------------------------------------------------|-----|
|      | circuit (right)                                                                     | 128 |
| 5.30 | Implemented differential-to-single-end converter                                    | 129 |
| 5.31 | Inherently 50% duty cycle differential-to-single-end converter                      | 130 |
| 5.32 | Output signals (left, red curve belongs to the alternative d2s) of both             |     |
|      | d2s implementations and their derivatives (right, smaller height blue               |     |
|      | curve belongs to the alternative d2s). $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ | 132 |
| 5.33 | VCO control curves for 15 corners.                                                  | 133 |
| 5.34 | VCO gain curves for 15 corners with lines drawn at maximum                          | 134 |
| 5.35 | VCO total power consumption for 15 different process corner index. $$ .             | 135 |
| 5.36 | VCO output duty cycle error as a function of corner index                           | 136 |
| 5.37 | Divide-by-four part of the feedback divider and its timing diagram.                 | 137 |
| 5.38 | Implemented phase/frequency detector                                                | 137 |
| 5.39 | Implemented edge detector.                                                          | 138 |
| 5.40 | Implemented charge pump and low-pass filter                                         | 139 |
| 6.1  | Clock and data recovery operation.                                                  | 140 |
| 6.2  | Burst-mode network.                                                                 | 142 |
| 6.3  | A PLL-based CDR architecture with $fine~{\rm and}~coarse$ adjustments               | 144 |
| 6.4  | A burst-mode CDR architecture based on delaying the incoming data.                  | 145 |
| 6.5  | A burst-mode CDR architecture based on gated VCO                                    | 145 |
| 6.6  | A continuous-mode CDR architecture based on blind oversampling                      | 146 |
| 6.7  | A continuous-mode CDR architecture based on semi-blind oversam-                     |     |
|      | pling                                                                               | 147 |
| 6.8  | A burst-mode CDR architecture (left) based on FSM and its state                     |     |
|      | diagram (right)                                                                     | 148 |
| 6.9  | The burst-mode CDR architecture implemented                                         | 149 |
| 6.10 | Simplified circuit of burst-mode CDR for simulation                                 | 150 |
| 6.11 | Delay line circuit implemented                                                      | 151 |
| 6.12 | Schematic of gated VCO test structure                                               | 151 |
| 6.13 | Fast TSPC D-FF                                                                      | 152 |
| 6.14 | Layout of gated VCO $(10x4\mu m^2)$ test structure                                  | 153 |
|      | December 17, 2                                                                      | 007 |

| 6.15 | Delay line transient response                                                    |
|------|----------------------------------------------------------------------------------|
| 6.16 | Delay variation as a function of process corner where input signal               |
|      | height is 1V and rise/fall times are chosen to be 20ps                           |
| 6.17 | Burst-mode operation showing the input data stream (bottom), out-                |
|      | put of the gating circuit (ED), output of the local GVCO (recovered              |
|      | clock) and the recovered data stream (top), respectively. $\ldots$               |
| 6.18 | MC simulation results showing the frequency difference distribution              |
|      | between the two GVCOs (left-most) and the output frequency distri-               |
|      | butions of both the GVCOs (middle and right-most) when $V_{ctrl} = V_{dd}$ . 156 |
| 6.19 | MC simulation results showing the frequency difference distribution              |
|      | between the two GVCOs (left-most) and the output frequency distri-               |
|      | butions of both the GVCOs (middle and right-most) when $V_{ctrl} = V_{dd}/2.157$ |
| 6.20 | MC simulation results showing the frequency difference distribution              |
|      | between the two VCOs (left-most) and the output frequency distri-                |
|      | butions of both the VCOs (middle and right-most); see text 158                   |
| 6.21 | MC simulation result of differential VCO, showing the frequency dif-             |
|      | ference distribution with $V_{ctrl}$ as the parameter                            |
| 6.22 | The GVCO test setup                                                              |
| 6.23 | The measured and MC simulated control curves of the single-ended                 |
|      | GVCO                                                                             |
| 6.24 | The measured output power spectrum of the single-ended GVCO 162 $$               |
| 6.25 | The measured overlapped waveform of the single-ended GVCO at the                 |
|      | first rising edge after the one at which the oscilloscope triggers 162           |
| 6.26 | The measured overlapped waveform of the single-ended VCO at which                |
|      | the oscilloscope triggers                                                        |
| 7.1  | The measurement result showing the offset problem; an identical set-             |
|      | ting resulting in different effective channel thresholds for different           |
|      | channels and different chips                                                     |
| 7.2  | Measured s-curve after equalization                                              |
| A.1  | Pole locations of general second-order system                                    |

| A.2  | Gain magnitude response of a low-pass second-order system for vari-       |
|------|---------------------------------------------------------------------------|
|      | ous damping factors                                                       |
| A.3  | Example single-loop feedback system                                       |
| A.4  | Step response as a function of $\xi$ for a low-pass, type-I, second-order |
|      | system                                                                    |
| A.5  | Definitions of crossover frequency (A) and phase margin (B) 186 $$        |
| A.6  | Overshoot as a function of the damping factor for a second-order          |
|      | system                                                                    |
| A.7  | Phase margin in degrees as a function of the damping factor for a         |
|      | second-order system                                                       |
| A.8  | Phase locked loop and the parameters used in the example 189              |
| A.9  | Root locus of a type-II, second-order system                              |
| A.10 | Step response of a type-II, second-order system                           |
| A.11 | Pole locations on s-plane and expected behaviors associated 194           |
| A.12 | Theoretical corresponding stability regions on s- and z-planes (left,     |
|      | filled area) and pole locations on z-plane with expected behaviors        |
|      | associated (right)                                                        |
| A.13 | Active filter required                                                    |
| A.14 | Root locus variation                                                      |
| A.15 | VCO control signal transient response                                     |

Page intentionally left blank.

## List of Tables

| 2.1 | Characteristics of the beam for the muon and the hadron programs 29              |
|-----|----------------------------------------------------------------------------------|
| 3.1 | Shaper performance                                                               |
| 4.1 | Properties of the CMAD                                                           |
| 5.1 | Jitter transfer function verification result with sinusoidal reference           |
|     | jitter ( $\omega_n = 1MHz$ )                                                     |
| 5.2 | Jitter transfer function verification result for white reference jitter $.\ 127$ |
| 5.3 | Technology process corners used for parameter extraction 131                     |
| A.1 | Steady-state phase errors for various system types                               |

Page intentionally left blank.

## Listings

| B.1  | Model for high frequency clock generator                                                               |
|------|--------------------------------------------------------------------------------------------------------|
| B.2  | Model for %120 with white jitter. $\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots 202$ |
| B.3  | Model for phase/frequency detector                                                                     |
| B.4  | Model for charge pump                                                                                  |
| B.5  | Function $i_mult$ used within carge-pump model                                                         |
| B.6  | White jittered local voltage controlled oscillator model                                               |
| B.7  | Local voltage controlled oscillator model including process corner and                                 |
|      | duty cycle error parameters                                                                            |
| B.8  | Probe module calculating relative phase error                                                          |
| B.9  | Period meter probe module calculating instantaneous periods of the                                     |
|      | input clock                                                                                            |
| B.10 | Model parameters in use                                                                                |
| B.11 | Decoding input switches                                                                                |
| B.12 | Loop parameter calculation                                                                             |
| B.13 | A possible main function                                                                               |
| B.14 | Introducing corner parameters                                                                          |
| B.15 | Constructing loop transfer function and generating behavioral plots $217$                              |
| B.16 | Constructing noise transfer functions and generating behavioral plots. 217                             |
| B.17 | Calculating statistics out of instantaneous period values                                              |
| B.18 | Calculating normalized cycle-to-cycle jitter                                                           |
| B.19 | Calculating normalized absolute jitter                                                                 |

Page intentionally left blank.
# Index

# А

accelerator operating frequency, 90 adjust-ability limit of PLL, 116 adjustable filter, 139 ALICE, 87 ATLAS, 87

## В

baby creation problem, 7 band-gap, 67, 68 bandwidth, 192 baseline restorer, 40 baseline stabilization, 42 beam extraction, 28 bias, 68 binary architecture, 55 binary weighted, 55, 56 bit error rate, BER, 5 blind over-sampling, 149 blind parsing, 151 bode plots, 219 bottom-up design methodology, 7, 9 broadcast network, 103 bunch, 92 bunch clock, 90 bunch filling, 90 burst-mode, 142

burst-mode signals, 155

# С

calorimeter, 32 CAMAC, 77 CaPPeLLo, 114, 215 CaPPeLLo sources, 202 carrier velocity, 60 cascode amplifier, 42, 48 **CERN**, 87 channel equalization, 169 channel noise, 84 charge sensitive amplifier, 42 charge sharing, 139 charge-pump PLL, 107 CHEOPS, 14 cherenkov effect, 33 cherenkov emission, 31 chip filling, 167 chiral perturbation theory, 16 class AB, 42, 53 clock and data recovery, 142 clock recovery, 143 CMAD, 41, 77 CMS, 38 collision fragmentation, 21 color force, 15

234

common drain, 50 common source, 48 COMPASS, 14 COMPASS magnet, 29 confinement phenomenon, 16 consecutive identical bits, 148, 156 constant latency, 100 continuous time approximation, 108 continuous-mode communication, 147 control curve, 133, 161 control loop, 108 control voltage, 114, 119 controller, 145 corner parameters, 66, 218 coupling constant, 16 cross-over frequency, 187 current drive capability, 138 current-mode, 56 custom design, 6 cycle-to-cycle jitter, 119, 125, 162, 220

#### D

damping ratio, 187, 199 dark matter, 87 data acceptance, 144 data acquisition, 33, 89 data acquisition, DAQ, 2 dc balanced, 93 deep inelastic scattering, 19 delay cell, 128 DESY, 14 detection component, 3 device mismatch, 63 differential non-linearity, 56 differential-to-single-end converter, 130 digital-to-analog converter, 55 distortion, 62 distributions of frequency difference, 157, 161 dual loop CDR, 145 duty-cycle, 130, 137, 209 dynamic logic, 154

## Е

edge detector, 151 efficiency, 85 electrical fan-out, 104 electrical transceiver, 102 EMC, 14 end of packet, 93 error correction, 98 ethernet standard, 100 exotic quantum numbers, 22 expected plateau, 80 experimental system development, 2

#### F

finite-state machine, 150 folded cascode, 55 frequency domain, 183 frequency excursion, 110 frequency response, 184 frequency synthesizer, 105 front-end, 38

December 17, 2007

#### INDEX

front-end channel, 41 front-end, FE, 2, 4 full width at halve maximum, 81 full-swing, 128

## G

gated VCO, 130, 147, 151 gating signal, 156 GBT, 89, 100 generalized parton distribution, 21 granularity, 81 group theory, 24 guard ring, 40

### Η

hamming, 93 hardware description language, HDL, 116, 202 heavy quark effective theories, 24 HERMES, 14, 19 higgs boson, 87 huygens principle, 34 hybrid CDR, 149 hysteresis, 74

#### Ι

integral non-linearity, 56 integral term, 110, 120 intellectual property, 8 inter-digitized, 60 inverse kinematics, 23

#### J

jitter, 90, 91

jitter peaking, 123 jitter probe, 117, 213 jitter suppression, 123 jitter transfer function, 111, 123

# Κ

kelvin divider, 55

## L

ladder, 60 large angle spectrometer, 26 large hadron collider, 38, 87 latch-up, 40 lateral field, 60 lattice quantum chromodynamics, 16 leading hadron, 19 less significant bit, 57 LHC upgrade, 94, 96, 97 LHCb, 87 line data rate, 100 line driver, 107 linearity, 80 lock-up time, 192, 199 locked state, 212 loop gain, 184 loop operating point, 110, 114 loop parametrization, 116 low drop-out voltage regulator, 56, 67 luminosity, 87

#### М

M2 beam line, 28 MAD-4, 38 December 17, 2007 mismatch, 62, 63, 117 mixed signal design, 7 mixed-signal extension, 202 monte carlo, 62, 66, 73 most significant bit, 57 multi-anode photo-multiplier tube, 41 multi-wire proportional chamber, 31, 38 multiplexer, 107

#### Ν

natural frequency, 109, 183 NMC, 14 noise transfer functions, 219 normalized loop gain, 110

#### 0

octave scripts, 202 offset between front-end channels, 76, 167 offset cancellation, 60 one-shot, 44 orbit signal, 90 order, 192 over-damped, 111 over-sampling CDR, 148 overshoot, 199

## Р

packet-mode, 142 parameter extraction, 117, 132 particle identification, 35 peaking time, 44 perturbation theory, 23 perturbative quantum chromodynamics, 16 phase error, 121 phase locked loop, 89, 105, 107 phase locking, 121 phase margin, 187 phase/frequency detector, 119 photon-gluon fusion, 19 plant, 145 PLL, 143 PLL design example, 197 PLL operation, 191 PLL parametrization, 190, 215 power spectra, 162 pre-fabrication, 167 preamble, 143 preamplifier gain, 78 primakoff reaction, 23 probe modules, 212 probing, 80 process corner, 66, 117, 209 process independence, 129, 161 process variation, 63 proportional term, 110, 120 pseudo-random bit sequence, PRBS, 152

## Q

quantum chromodynamics, 16 quantum electrodynamics, 16 quark parton model, 19 December 17, 2007 quarks, 15 quiescent current, 52

#### R

R-2R, 55 radiation hard, 96, 99 raw data, 2 re-timing, 156 read-out, 77 read-out channels, ROC, 2 redundancy scheme, 107 RICH detector, 26, 31, 33 ring image, 34 ring oscillator, 128 ripple on control voltage, 110 root locus, 199, 219 running coupling constant, 16

#### S

s-curve, 80 s-plane, 196, 219 sampled nature, 108 scintillator, 32 second-order, 108, 192 semi inclusive, 19 serializer, 89, 105 settling-time, 199 shaper, 40 shunt loading, 110 sigmoid function, 80 sinusoidal jitter, 204 SLAC, 19 slew rate, 131 slew-rate limited non-linear buffer, 44, 50slow control, 89 small angle spectrometer, 26 small-signal equivalent resistance, 60 SMC, 14 spin, 18spin contribution, 19 stability limit, 108, 110, 111, 114 start of packet, 93 static phase error, 109, 117 steady state phase error, 195 step acceleration, 194 step position, 194 step response, 185 step velocity, 194 super LHC, 89, 94

## Т

terminating resistor, 153 thermometer coded, 56 threshold scanning, 80, 82 time division multiplexing, 93, 105 time over threshold, 74 timing trigger and control, 89 tirggering, 92 top-down design methodology, 9, 11 tracking principle, 3 trans-conductor, 43 transfer function, 218 triple-redundancy, 138

#### December 17, 2007

true single-phase clock, TSPC, 154 tunability, 42 type, 191

## U

un-locked condition, 195 under-damped, 186 unit interval, UI, 165

V

VCO, 132 VCO gain, 133 verification, 8 verilog, 116 verilog HDL, 202 vertex reconstruction, 30 VHDL, 202 voltage buffer, 42 voltage controlled oscillator, 110 voltage follower, 139 voltage-mode, 56

#### W

wave front, 34 white jitter, 204 white stimuli, 80

# Ζ

z-plane, 111

Page intentionally left blank.

Page intentionally left blank.