HIP: Heterogenous-computing Interface for Portability
HIP Programming Guide

Host Memory

Introduction

hipHostMalloc allocates pinned host memory which is mapped into the address space of all GPUs in the system. There are two use cases for this host memory:

Memory allocation flags

hipHostMalloc always sets the hipHostMallocPortable and hipHostMallocMapped flags. Both usage models described above use the same allocation flags, and the difference is in how the surrounding code uses the host memory. See the hipHostMalloc API for more information.

Coherency Controls

ROCm defines two coherency options for host memory:

IP provides the developer with controls to select which type of memory is used via allocation flags passed to hipHostMalloc and the HIP_HOST_COHERENT environment variable:

Visibility of Zero-Copy Host Memory

Coherent host memory is automatically visible at synchronization points. Non-coherent

HIP API Synchronization Effect Fence Coherent Host Memory Visibiity Non-Coherent Host Memory Visibility
hipStreamSynchronize host waits for all commands in the specified stream to complete system-scope release yes yes
hipDeviceSynchronize host waits for all commands in all streams on the specified device to complete system-scope release yes yes
hipEventSynchronize host waits for the specified event to complete device-scope release yes depends - see below
hipStreamWaitEvent stream waits for the specified event to complete none yes no

hipEventSynchronize

Developers can control the release scope for hipEvents:

A stronger system-level fence can be specified when the event is created with hipEventCreateWithFlags:

Summary and Recommendations:

Device-Side Malloc

HIP-Clang currenntly doesn't supports device-side malloc and free.

Use of Long Double Type

In HIP-Clang, long double type is 80-bit extended precision format for x86_64, which is not supported by AMDGPU. HIP-Clang treats long double type as IEEE double type for AMDGPU. Using long double type in HIP source code will not cause issue as long as data of long double type is not transferred between host and device. However, long double type should not be used as kernel argument type.

FMA and contractions

By default HIP-Clang assumes -ffp-contract=fast. For x86_64, FMA is off by default since the generic x86_64 target does not support FMA by default. To turn on FMA on x86_64, either use -mfma or -march=native on CPU's supporting FMA.

When contractions are enabled and the CPU has not enabled FMA instructions, the GPU can produce different numerical results than the CPU for expressions that can be contracted. Tolerance should be used for floating point comparsions.

clang_options.md "Supported Clang Options"