10th Annual Soft Error Rate (SER) Workshop – Details

— Detailed descriptions of the talks at the SER Worshop (preliminary) …

Below are summaries of each of the tutorials and talks (not in the order in which they will be given). Return to this page following the Workshop to download PDFs of the slides and access links to the lecture video recordings.

Presenter Title Details
Eric Crabill, Xilinx Prerecorded Tutorial: Single Event Effects (Please view before attending; link will be sent) This tutorial is a technical backgrounder on Single Event Effects (SEE) in semiconductor devices, to establish a baseline understanding of origins, effects, mitigation, and testing. Key points made in this presentation are:

  • SEE have a relatively long history and can affect all semiconductor devices.
  • SEE arise from environmental radiation and present a variety of undesired behaviors.
  • SEE mitigation is possible and SEE susceptibility can be measured.

    After this presentation, attendees will have general familiarity with radiation effects in semiconductor devices. With this background, they will be primed for the presentations that follow during the day.
  • 8:15 AM (PT) On-Site Registration — Coffee, tea
    Eric Prebys, UC Davis The Crocker Nuclear Laboratory (30 minutes) In the 1960s, the magnetic return yoke and pole pieces from Ernest Lawrence’s 1939 60″ cyclotron were moved to UC Davis, where they formed the basis of the significantly upgraded 76″ Crocker Nuclear Laboratory Cyclotron, which first took beam in 1966 and is still operating today. The Crocker Cyclotron can accelerate species up to alpha particles at variable energies up to a maximum of 67.5 MeV for protons. Over the years, a wide variety of research has been done at the cyclotron, including some very interesting environmental studies. Currently, the cyclotron is used to treat eye cancer for one week each month, and the rest of the time is primarily used for radiation effects studies, for which the lab charges an hourly rate. Recently, the university has made a new commitment to the cyclotron, in the hopes of expanding the research and educational program, in addition to improving service to our commercial customers. This talk will summarize the history and status of the cyclotron, as well as current plans for the future. It’s also hoped to solicit input from the community about potential improvements to the facility.
    Joe Hupcey, Mentor The Crocker Nuclear Laboratory (30 minutes) Developing high-reliability FPGA designs demands mitigating single-event upsets (SEUs). A significant amount of effort is devoted to planning and designing SEU protection logic, but validation and testing of this protection is often limited. Functional simulation techniques provide some level of confidence but it is not exhaustive and extraordinarily time consuming. Code reviews and manual analysis of SEU protection often misses issues and is not an automated approach. Wouldn’t it be great if there was a technology that would exhaustively test this logic in an automated way? Luckily, there is! Formal verification techniques and Sequential Logical Equivalency Checking (SLEC) can be used to exhaustively prove SEU mitigation logic for all configurations and state space. In this session we give an overview of these techniques and their application to fully verify SEU protection logic.
    Paula Chen, Xilinx 64 MeV Proton Single-Event Evaluation of Xilinx 20nm DDR4-IO Design (30 minutes) The single-event upset response of a customer memory interface design implemented in a Xilinx 20nm XCKU040 Field Programmable Gate Array (FPGA) is presented. Furthermore, a methodology to estimate designs FIT is discussed & validated using high energy protons.
    Vinod Ambrose, Intel Soft Error Rate Measurements in Solid State Drives (30 minutes) While the flash media in Solid State Drives (SSD) are not vulnerable to particle-induced SEU, there are other components, like the controller, that are vulnerable. Not a lot of measurements have been made on SEU vulnerabilities in SSDs. Measurement results are presented that show large SEU rates resulting in surprisingly high Silent Data Corruption (SDC) and Detected Unrecoverable Errors (DUE) unless careful consideration is given to SSD design. Results include alpha and neutron data for current generation datacenter and client SSDs from Intel and other manufacturers.
    Robert Baumann, Radiosity Solutions Making the Grade: From COTS to Space-grade Electronics (40 minutes) Small satellites are the most rapidly growing space sector due to their shorter development cycles, smaller-scale development teams, and reduced fabrication and launch costs. Many of these small-sat projects are being driven by relatively new (to space) commercial “players” who favor using Consumer-off-the-Shelf (COTS) over higher quality and more reliable space-grade components. After a brief review of the space environment and the types of chronic and instantaneous radiation effects that it can induce, we consider unintentional radiation performance enhancements gained as a natural consequence of technology scaling as well as some examples of targeted solutions for space-grade electronics. We also provide a couple of sobering real-world examples of how normal manufacturing variation in COTS can potentially lead to space mission failures. After a considering the quality, material, and test-coverage differences between the different grades from COTS to space-grade, we conclude with some observations about the suitability of each space grade in the rapidly evolving space market.
    Sang Hoon Jeon, HanYang University Logic Upsets in DDR4 SDRAMs Using 480 MeV Protons (30 minutes) In this technical presentation, a soft error study on logic upset in control logic is presented using a 480 MeV proton beam on commercial DDR4 SDRAM components from two different manufacturers. Soft error in logic is critical as logic portion of DRAM designs occupy larger real estate inside a chip and technology down-scaling decreases the critical charge. Also, one single-event upset in the logic part of SDRAM can cause thousands of bit flips that result in catastrophic failures. Moreover, some of the upset event in the control logic was not cleared by many read and write operations. Comparative study against DDR3 SDRAMs shows that DDR4 had 45% higher single bit upset (SBU) cross-section than DDR3. What is more, to understand how the storage capacitance of down-scaling DDR technologies affects soft error, soft error bits were compared to retention weak bits. No evidence was found that indicated that retention weak bits were more sensitive to soft error.
    Chamkaur Ghag, University College London Ultra-Low Background Radioassay Facilities at the Boulby Underground Laboratory (30 minutes) The UK’s Boulby Underground Laboratory, located 1100 m underground, operates cleanrooms for the operation of rare-event search experiments as well as unique radio-assay facilities for the screening campaigns necessary in constructing such experiments. The facility now hosts 7 ultra-low background gamma spectroscopy detectors, including the world’s most sensitive at low energies, crucial for the direct assay of Pb-210. The facility also hosts an XIA UltraLow-1800 in the same location and as such we are able to probe both bulk and surface radon daughter contamination in materials down to unprecedented levels. Our radio-assay facilities at Boulby are complemented by our world-class mass spectrometry (ICP-MS) and radon emanation measurement facilities at University College London. With our ICP-MS facility we achieve less than 10 parts per trillion (g/g) sensitivity to U238 and Th232, with turnaround of less than a day, using leading microwave digestion infrastructure to rapidly assess almost any material. Our radon facility can assay materials down to emanation rates of 0.1 mBq of Rn222, in large volume chambers for any material type. This talk shall present these facilities and the opportunities for radio-assays of materials of relevance to the electronics and advanced materials sectors using a complete suite of in-house ultra-low background techniques currently deployed for world-leading science applications and now available also for industry.
    Y. Sawada, MMC Development of High-Precision and Sensitive 2π Gas Flow-Type Alpha Particle Counter (30 minutes) Along with the high-density integration of semiconductors in recent years, the requirement of alpha count for electronic materials such as solder have become stricter. Recently, some manufacturers demand 0.001cph/cm2, which is lower than the current highest grade of 0.002cph/cm2. In order to accurately measure the alpha count of ultra-low alpha materials, an extended time for measurement is necessary using a low background counter. In the case of the current background level (3 to 4cph), measurement becomes possible when the limit of detection drops to the necessary level, which requires a measurement time of 100 hours or longer. However, since the limit of detection hardly changes when measured at 100 hours or longer, specs less than 0.001cph/cm2 can’t be handled at the current background level. By dropping the background, the limit of detection decreases, which leads to an improvement in accuracy. We developed a new counter (gas flow proportional type) with a background level of around 1cph. As a result, the limit of detection decreased, improving the accuracy and sensitivity of the counter. Additionally, due to the decrease in background, the time required to reach the necessary level for 0.001cph/cm2 measurement halved compared to the conventional counter.
    Paul Muller, IBM Multi Bit Upset Mitigation Using the Fall Off Curve (30 minutes) Energetic particles and radiation can negatively impact silicon-based electronics, e.g. high doses of X-Rays or energetic neutrons can cause threshold voltage shifts for a multitude of transistors. Even a single particle which usually impacts only a single memory cell (Single Event Upset, SEU), can – under specific conditions – impact a multitude of cells. In such case we talk of a multi cell upset (MCU). For this to happen, the particle needs to travel in the plane of the active silicon over a distance which covers a multitude of cells. We are showing results of a proton beam experiment where this happens over a distance of up to 14 micrometers. A subset of the MCUs are multi cell upsets in one specific direction, namely the direction of the word line. These are called multi bit upsets (MBUs). While multi cell upsets in all other directions are easily detected and corrected, multi cell upsets in the direction of the word line are not, e.g. parity protection will fail for all even numbers of upsets. And error correction code (ECC) in the form of single error correct, double error detect (SEC-DED) fails to correct when two cells are upset, and starts failing to detect when three cells are upset.
    For these reasons, special attention needs to be paid to multi bit upsets (MBUs). The probability that besides a first cell, any additional cells on the same word line get upset, falls off quickly with increased physical distance. This probability can be shown for a variety of technology generations in a common Fall Off Curve. Besides the physical distance, the pattern of the stored data (checker board vs. alternating columns vs. all zero vs. all one, etc.) plays also a role. Interleaving is a technique to generate more SER robust array designs by increasing the physical distance between two cells protected by the same ECC word. This is achieved by interposing one or more cells protected by a different ECC word in between these two cells. If one cell is interposed, we talk about 2:1 interleaving, if three cells are interposed, we talk about 4:1 interleaving, and so on. The higher the interleaving ratio, the longer the physical distance between sensitive nodes, and thus the lower the probability of an MBU. The resulting designs are several orders of magnitude more MBU resilient. Thus, interleaving is a powerful tool to generate SER robust designs, and an alternative to more aggressive correction codes like double error correct, triple error detect (DEC-TED) codes or symbol ECC.
    Balaji Narasimhan, Broadcom Soft Errors: From Technology Trends to System-Level Performance (30 minutes) Technology scaling trends in the soft error rate (SER) of SRAMs from planar to FinFET process nodes are presented. While scaling from planar to the first FinFET process provided a large reduction in per-bit SER, the subsequent scaling within FinFET process nodes results in SER reduction comparable to per-bit cell area reduction. On the other hand, FinFET processes show a strong exponential increase in the SER with reduction in bias, compared to a linear bias dependence for the planar process. Implications of these results from system-level perspective are discussed. System-level SER test results for gigabit Ethernet transceiver devices are compared with estimates based on the technology SER data. Results indicate that most SEUs cause benign packet errors which are recoverable while the SER for more severe link-drop type events is significantly lower than estimates for this application. The talk highlights the importance of understanding the technology trends in SER as well as the true impact of soft errors from system perspective considering the intended product application.
    Mark Hanhardt, Sanford Underground Research Opportunities at Sanford Underground Research Facility (30 minutes) The Sanford Underground Research Facility provides an extremely low-background environment for conducting world-leading science experiments. An overview of the experiments and relevant fields of science will be presented. SURF has potential to house many additional experiments and welcomes applications from interested research/testing groups.
    Krishna Mohan, GlobalFoundries Demonstration of Soft Error Rate Robustness with Process Improvements and Material Changes – Case Studies (30 minutes) More often semiconductor foundry customers have more stringent Soft Error Rate (SER) radiation test reliability needs to meet either their end customers or their own product specific field applications’ stretched and stringent requirements. This was clearly evident especially in matured micron to sub-micron nodes where the SER requirements for foundry technology qualification were of 1000 to 2000 FITs/Mb aligned with ITRS Roadmap. This paper presents process studies on such cases where by foundry used Process Improvements and material changes implemented to achieve >50% to 2 order improvements on SER robustness on two of the technology nodes.
    Phil Oldiges, IBM Physics-Inspired Spreadsheet Models for Latch and SRAM SER Screening (30 minutes) (summary to be added)
    4:30 PM (PT) Close of Workshop