You have already completed the Test before. Hence you can not start it again.
Test is loading...
You must sign in or sign up to start the Test.
You have to finish following quiz, to start this Test:
Your results are here!! for" NVIDIA NCP-AIN Practice Test 3 "
0 of 60 questions answered correctly
Your time:
Time has elapsed
Your Final Score is : 0
You have attempted : 0
Number of Correct Questions : 0 and scored 0
Number of Incorrect Questions : 0 and Negative marks 0
Average score
Your score
NVIDIA NCP-AIN
You have attempted: 0
Number of Correct Questions: 0 and scored 0
Number of Incorrect Questions: 0 and Negative marks 0
You can review your answers by clicking on “View Answers” option. Important Note : Open Reference Documentation Links in New Tab (Right Click and Open in New Tab).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Answered
Review
Question 1 of 60
1. Question
An AI infrastructure team is deploying a 128-node H100 GPU cluster requiring 800G connectivity between compute nodes and storage arrays. They‘re evaluating NVIDIA Spectrum-X integration with SN5000 series switches. What is the critical architectural consideration when integrating SN5000 switches into their existing Spectrum-2 (SN3000 series) fabric for GPU-to-storage traffic?
Correct
Integrating SN5000 series 800G switches with existing Spectrum-2 infrastructure requires careful attention to congestion control consistency. The speed differential between 800G and 100G links creates potential congestion points where traffic converges. Uniform ECN marking thresholds and PFC configurations across all switch generations ensure lossless Ethernet behavior critical for RDMA over RoCE v2 used in GPU communication. This prevents cascading congestion effects that degrade multi-GPU training performance and GPU-to-storage throughput in AI clusters.
Incorrect
Integrating SN5000 series 800G switches with existing Spectrum-2 infrastructure requires careful attention to congestion control consistency. The speed differential between 800G and 100G links creates potential congestion points where traffic converges. Uniform ECN marking thresholds and PFC configurations across all switch generations ensure lossless Ethernet behavior critical for RDMA over RoCE v2 used in GPU communication. This prevents cascading congestion effects that degrade multi-GPU training performance and GPU-to-storage throughput in AI clusters.
Unattempted
Integrating SN5000 series 800G switches with existing Spectrum-2 infrastructure requires careful attention to congestion control consistency. The speed differential between 800G and 100G links creates potential congestion points where traffic converges. Uniform ECN marking thresholds and PFC configurations across all switch generations ensure lossless Ethernet behavior critical for RDMA over RoCE v2 used in GPU communication. This prevents cascading congestion effects that degrade multi-GPU training performance and GPU-to-storage throughput in AI clusters.
Question 2 of 60
2. Question
A network team plans to upgrade firmware on 40 spine switches in their datacenter fabric. They need to capture the current network state, perform the upgrade, and validate that routing protocols and interface states return to baseline. Which NetQ workflow accomplishes this pre/post change verification?
Correct
NetQ‘s snapshot-based change validation workflow is the standard approach for pre/post verification. By capturing complete network state before changes (routing protocols, interface status, BGP sessions, EVPN routes), performing maintenance, then capturing post-change state, teams can systematically compare snapshots to validate successful reconvergence. This methodology identifies configuration drift, missing BGP neighbors, or interface state discrepancies, providing comprehensive change impact assessment beyond what alert monitoring or predefined validation checks offer.
Incorrect
NetQ‘s snapshot-based change validation workflow is the standard approach for pre/post verification. By capturing complete network state before changes (routing protocols, interface status, BGP sessions, EVPN routes), performing maintenance, then capturing post-change state, teams can systematically compare snapshots to validate successful reconvergence. This methodology identifies configuration drift, missing BGP neighbors, or interface state discrepancies, providing comprehensive change impact assessment beyond what alert monitoring or predefined validation checks offer.
Unattempted
NetQ‘s snapshot-based change validation workflow is the standard approach for pre/post verification. By capturing complete network state before changes (routing protocols, interface status, BGP sessions, EVPN routes), performing maintenance, then capturing post-change state, teams can systematically compare snapshots to validate successful reconvergence. This methodology identifies configuration drift, missing BGP neighbors, or interface state discrepancies, providing comprehensive change impact assessment beyond what alert monitoring or predefined validation checks offer.
Question 3 of 60
3. Question
A multi-node AI training cluster experiences communication bottlenecks during AllReduce operations across 64 H100 GPUs. Which SHARP architecture component should be implemented to reduce data movement and improve collective operation efficiency in the InfiniBand fabric?
Correct
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) implements in-network computing by embedding aggregation engines directly into InfiniBand switch ASICs. These engines perform MPI reduction operations (AllReduce, Reduce, Broadcast) on data as it traverses the network fabric, eliminating the need for data to reach host CPUs or GPUs for intermediate aggregation. This architecture reduces network traffic by up to 95% and improves collective operation latency by 5-10x compared to traditional host-based approaches, making it critical for large-scale multi-node GPU training workloads.
Incorrect
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) implements in-network computing by embedding aggregation engines directly into InfiniBand switch ASICs. These engines perform MPI reduction operations (AllReduce, Reduce, Broadcast) on data as it traverses the network fabric, eliminating the need for data to reach host CPUs or GPUs for intermediate aggregation. This architecture reduces network traffic by up to 95% and improves collective operation latency by 5-10x compared to traditional host-based approaches, making it critical for large-scale multi-node GPU training workloads.
Unattempted
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) implements in-network computing by embedding aggregation engines directly into InfiniBand switch ASICs. These engines perform MPI reduction operations (AllReduce, Reduce, Broadcast) on data as it traverses the network fabric, eliminating the need for data to reach host CPUs or GPUs for intermediate aggregation. This architecture reduces network traffic by up to 95% and improves collective operation latency by 5-10x compared to traditional host-based approaches, making it critical for large-scale multi-node GPU training workloads.
Question 4 of 60
4. Question
Your team is training a 175B parameter LLM across 64 H100 GPUs distributed over 8 DGX nodes. Network bandwidth utilization is at 85% during all-reduce operations, causing training slowdowns. Which network scaling approach would MOST effectively reduce communication overhead?
Correct
Network scaling for large model training requires addressing physical bandwidth limitations. With 85% utilization on HDR InfiniBand during all-reduce operations across 8 nodes, upgrading to NDR InfiniBand (400 Gbps) directly doubles available bandwidth, providing headroom for efficient gradient synchronization. This hardware upgrade is more effective than algorithmic optimizations when bandwidth saturation is the bottleneck.
Incorrect
Network scaling for large model training requires addressing physical bandwidth limitations. With 85% utilization on HDR InfiniBand during all-reduce operations across 8 nodes, upgrading to NDR InfiniBand (400 Gbps) directly doubles available bandwidth, providing headroom for efficient gradient synchronization. This hardware upgrade is more effective than algorithmic optimizations when bandwidth saturation is the bottleneck.
Unattempted
Network scaling for large model training requires addressing physical bandwidth limitations. With 85% utilization on HDR InfiniBand during all-reduce operations across 8 nodes, upgrading to NDR InfiniBand (400 Gbps) directly doubles available bandwidth, providing headroom for efficient gradient synchronization. This hardware upgrade is more effective than algorithmic optimizations when bandwidth saturation is the bottleneck.
Question 5 of 60
5. Question
What best describes the primary function of NVIDIA BlueField SuperNIC in Spectrum-X architecture?
Correct
NVIDIA BlueField SuperNIC is a Data Processing Unit (DPU) that offloads networking, storage, and security tasks from host processors in Spectrum-X environments. By handling RDMA over Converged Ethernet (RoCE), congestion control, and packet processing independently, it frees GPU and CPU resources for AI computation while ensuring optimized network performance for distributed training workloads.
Incorrect
NVIDIA BlueField SuperNIC is a Data Processing Unit (DPU) that offloads networking, storage, and security tasks from host processors in Spectrum-X environments. By handling RDMA over Converged Ethernet (RoCE), congestion control, and packet processing independently, it frees GPU and CPU resources for AI computation while ensuring optimized network performance for distributed training workloads.
Unattempted
NVIDIA BlueField SuperNIC is a Data Processing Unit (DPU) that offloads networking, storage, and security tasks from host processors in Spectrum-X environments. By handling RDMA over Converged Ethernet (RoCE), congestion control, and packet processing independently, it frees GPU and CPU resources for AI computation while ensuring optimized network performance for distributed training workloads.
Question 6 of 60
6. Question
What is the primary purpose of 400G/800G Ethernet in next-generation AI infrastructure?
Correct
400G/800G Ethernet provides the ultra-high bandwidth necessary for next-generation AI infrastructure, specifically addressing multi-node distributed training and inference workloads. As GPU clusters scale beyond single nodes with H100/H200 systems, these Ethernet speeds minimize network bottlenecks during intensive collective communication operations. This enables efficient scaling of large language model training and high-throughput inference deployments across multiple DGX systems or GPU nodes.
Incorrect
400G/800G Ethernet provides the ultra-high bandwidth necessary for next-generation AI infrastructure, specifically addressing multi-node distributed training and inference workloads. As GPU clusters scale beyond single nodes with H100/H200 systems, these Ethernet speeds minimize network bottlenecks during intensive collective communication operations. This enables efficient scaling of large language model training and high-throughput inference deployments across multiple DGX systems or GPU nodes.
Unattempted
400G/800G Ethernet provides the ultra-high bandwidth necessary for next-generation AI infrastructure, specifically addressing multi-node distributed training and inference workloads. As GPU clusters scale beyond single nodes with H100/H200 systems, these Ethernet speeds minimize network bottlenecks during intensive collective communication operations. This enables efficient scaling of large language model training and high-throughput inference deployments across multiple DGX systems or GPU nodes.
Question 7 of 60
7. Question
What is load balancing in the context of Adaptive Routing for InfiniBand fabrics?
Correct
Load balancing in Adaptive Routing distributes network traffic across multiple available paths in the InfiniBand fabric based on real-time congestion information. This technique optimizes bandwidth utilization by preventing any single path from becoming a bottleneck while other paths remain underutilized, ensuring efficient collective communication operations critical for multi-GPU training workloads.
Incorrect
Load balancing in Adaptive Routing distributes network traffic across multiple available paths in the InfiniBand fabric based on real-time congestion information. This technique optimizes bandwidth utilization by preventing any single path from becoming a bottleneck while other paths remain underutilized, ensuring efficient collective communication operations critical for multi-GPU training workloads.
Unattempted
Load balancing in Adaptive Routing distributes network traffic across multiple available paths in the InfiniBand fabric based on real-time congestion information. This technique optimizes bandwidth utilization by preventing any single path from becoming a bottleneck while other paths remain underutilized, ensuring efficient collective communication operations critical for multi-GPU training workloads.
Question 8 of 60
8. Question
What is the primary purpose of the ibstat command in InfiniBand fabric management?
Correct
The ibstat command is a fundamental diagnostic tool for verifying local InfiniBand port status. It displays port state (Active/Down/Init), link width, speed, physical state, and HCA information. This makes it the first-line troubleshooting command for identifying local connectivity issues before investigating fabric-wide problems with other tools.
Incorrect
The ibstat command is a fundamental diagnostic tool for verifying local InfiniBand port status. It displays port state (Active/Down/Init), link width, speed, physical state, and HCA information. This makes it the first-line troubleshooting command for identifying local connectivity issues before investigating fabric-wide problems with other tools.
Unattempted
The ibstat command is a fundamental diagnostic tool for verifying local InfiniBand port status. It displays port state (Active/Down/Init), link width, speed, physical state, and HCA information. This makes it the first-line troubleshooting command for identifying local connectivity issues before investigating fabric-wide problems with other tools.
Question 9 of 60
9. Question
Which statement best describes the security capabilities provided by NVIDIA BlueField DPUs in data center environments?
Correct
NVIDIA BlueField DPUs deliver comprehensive security through hardware-based isolation and encryption acceleration. They provide tenant isolation using dedicated hardware partitioning, offload encryption protocols (IPsec, TLS) to cryptographic accelerators, and implement secure boot with trusted execution environments. This architecture removes security processing burden from host CPUs while enhancing overall data center security posture through purpose-built hardware.
Incorrect
NVIDIA BlueField DPUs deliver comprehensive security through hardware-based isolation and encryption acceleration. They provide tenant isolation using dedicated hardware partitioning, offload encryption protocols (IPsec, TLS) to cryptographic accelerators, and implement secure boot with trusted execution environments. This architecture removes security processing burden from host CPUs while enhancing overall data center security posture through purpose-built hardware.
Unattempted
NVIDIA BlueField DPUs deliver comprehensive security through hardware-based isolation and encryption acceleration. They provide tenant isolation using dedicated hardware partitioning, offload encryption protocols (IPsec, TLS) to cryptographic accelerators, and implement secure boot with trusted execution environments. This architecture removes security processing burden from host CPUs while enhancing overall data center security posture through purpose-built hardware.
Question 10 of 60
10. Question
What is the primary purpose of the QM8700 architecture in NVIDIA Quantum switches?
Correct
The QM8700 is NVIDIA‘s switch architecture designed for HDR 200 Gbps InfiniBand networking, providing 40 ports per switch for building high-performance AI training clusters. It enables low-latency, high-bandwidth communication between compute nodes through features like adaptive routing, congestion control, and GPUDirect RDMA support, which are essential for efficient multi-node distributed training with NCCL.
Incorrect
The QM8700 is NVIDIA‘s switch architecture designed for HDR 200 Gbps InfiniBand networking, providing 40 ports per switch for building high-performance AI training clusters. It enables low-latency, high-bandwidth communication between compute nodes through features like adaptive routing, congestion control, and GPUDirect RDMA support, which are essential for efficient multi-node distributed training with NCCL.
Unattempted
The QM8700 is NVIDIA‘s switch architecture designed for HDR 200 Gbps InfiniBand networking, providing 40 ports per switch for building high-performance AI training clusters. It enables low-latency, high-bandwidth communication between compute nodes through features like adaptive routing, congestion control, and GPUDirect RDMA support, which are essential for efficient multi-node distributed training with NCCL.
Question 11 of 60
11. Question
An AI infrastructure team is deploying a 128-GPU cluster for distributed LLM training with H100 GPUs. They need to configure Spectrum-X to minimize collective communication latency for NCCL AllReduce operations. Which Spectrum-X configuration approach optimizes AI workload performance?
Correct
Spectrum-X is NVIDIA‘s AI-optimized Ethernet platform combining Spectrum-4 switches with adaptive routing and RoCE v2 for lossless transport. For distributed training, the optimal configuration enables adaptive routing to dynamically avoid congestion, combined with RoCE v2 and PFC on lossless traffic classes. This architecture minimizes tail latency for NCCL collective operations, provides near-InfiniBand performance on Ethernet infrastructure, and scales efficiently for large GPU clusters.
Incorrect
Spectrum-X is NVIDIA‘s AI-optimized Ethernet platform combining Spectrum-4 switches with adaptive routing and RoCE v2 for lossless transport. For distributed training, the optimal configuration enables adaptive routing to dynamically avoid congestion, combined with RoCE v2 and PFC on lossless traffic classes. This architecture minimizes tail latency for NCCL collective operations, provides near-InfiniBand performance on Ethernet infrastructure, and scales efficiently for large GPU clusters.
Unattempted
Spectrum-X is NVIDIA‘s AI-optimized Ethernet platform combining Spectrum-4 switches with adaptive routing and RoCE v2 for lossless transport. For distributed training, the optimal configuration enables adaptive routing to dynamically avoid congestion, combined with RoCE v2 and PFC on lossless traffic classes. This architecture minimizes tail latency for NCCL collective operations, provides near-InfiniBand performance on Ethernet infrastructure, and scales efficiently for large GPU clusters.
Question 12 of 60
12. Question
You are optimizing multi-node LLM training on a 64-node H100 cluster with HDR InfiniBand fabric. Gradient synchronization across nodes creates significant communication overhead during backpropagation. Which technology provides the MOST effective all-reduce optimization for this scenario?
Correct
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) is the optimal solution for all-reduce optimization in large-scale distributed training. It performs gradient aggregation directly within InfiniBand switches, eliminating host-based reduction overhead. This in-network reduction approach reduces collective operation latency by 40-60%, frees GPU resources for computation, and scales efficiently across hundreds of nodes. SHARP is specifically designed for AI workloads with frequent all-reduce patterns during training.
Incorrect
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) is the optimal solution for all-reduce optimization in large-scale distributed training. It performs gradient aggregation directly within InfiniBand switches, eliminating host-based reduction overhead. This in-network reduction approach reduces collective operation latency by 40-60%, frees GPU resources for computation, and scales efficiently across hundreds of nodes. SHARP is specifically designed for AI workloads with frequent all-reduce patterns during training.
Unattempted
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) is the optimal solution for all-reduce optimization in large-scale distributed training. It performs gradient aggregation directly within InfiniBand switches, eliminating host-based reduction overhead. This in-network reduction approach reduces collective operation latency by 40-60%, frees GPU resources for computation, and scales efficiently across hundreds of nodes. SHARP is specifically designed for AI workloads with frequent all-reduce patterns during training.
Question 13 of 60
13. Question
You are training a 175B parameter LLM on a DGX H100 system with 8 GPUs. The model is too large to fit on a single GPU, but each layer fits in GPU memory. Which model parallelism approach would minimize inter-GPU communication overhead while maximizing GPU utilization?
Correct
Pipeline parallelism is the optimal choice when layers fit in GPU memory but the full model doesn‘t. It partitions sequential layers across GPUs and uses micro-batching to overlap computation and communication, minimizing idle time. Communication occurs only at pipeline stage boundaries (passing activations forward and gradients backward), resulting in significantly lower inter-GPU traffic than tensor parallelism, which requires synchronization after every layer operation. For an 8-GPU DGX H100 system with a 175B model, pipeline parallelism maximizes GPU utilization while keeping communication minimal.
Incorrect
Pipeline parallelism is the optimal choice when layers fit in GPU memory but the full model doesn‘t. It partitions sequential layers across GPUs and uses micro-batching to overlap computation and communication, minimizing idle time. Communication occurs only at pipeline stage boundaries (passing activations forward and gradients backward), resulting in significantly lower inter-GPU traffic than tensor parallelism, which requires synchronization after every layer operation. For an 8-GPU DGX H100 system with a 175B model, pipeline parallelism maximizes GPU utilization while keeping communication minimal.
Unattempted
Pipeline parallelism is the optimal choice when layers fit in GPU memory but the full model doesn‘t. It partitions sequential layers across GPUs and uses micro-batching to overlap computation and communication, minimizing idle time. Communication occurs only at pipeline stage boundaries (passing activations forward and gradients backward), resulting in significantly lower inter-GPU traffic than tensor parallelism, which requires synchronization after every layer operation. For an 8-GPU DGX H100 system with a 175B model, pipeline parallelism maximizes GPU utilization while keeping communication minimal.
Question 14 of 60
14. Question
What is the purpose of Layer 3 and Layer 4 protocols in RoCE (RDMA over Converged Ethernet) implementations?
Correct
RoCE (RDMA over Converged Ethernet) leverages Layer 3 IP for routing packets across networks and Layer 4 UDP for transport services. This combination enables high-performance, low-latency RDMA operations over standard Ethernet infrastructure, which is critical for multi-node GPU clusters using technologies like GPUDirect RDMA. UDP is chosen over TCP because RDMA implements its own reliability layer.
Incorrect
RoCE (RDMA over Converged Ethernet) leverages Layer 3 IP for routing packets across networks and Layer 4 UDP for transport services. This combination enables high-performance, low-latency RDMA operations over standard Ethernet infrastructure, which is critical for multi-node GPU clusters using technologies like GPUDirect RDMA. UDP is chosen over TCP because RDMA implements its own reliability layer.
Unattempted
RoCE (RDMA over Converged Ethernet) leverages Layer 3 IP for routing packets across networks and Layer 4 UDP for transport services. This combination enables high-performance, low-latency RDMA operations over standard Ethernet infrastructure, which is critical for multi-node GPU clusters using technologies like GPUDirect RDMA. UDP is chosen over TCP because RDMA implements its own reliability layer.
Question 15 of 60
15. Question
Your team is deploying a multi-node distributed training cluster with 64 H100 GPUs across 8 DGX H100 nodes connected via InfiniBand HDR. When would you use NCCL‘s automatic topology detection feature instead of manually specifying the communication topology?
Correct
NCCL‘s automatic topology detection is most beneficial in heterogeneous or dynamic environments where network configurations vary or change at runtime. It automatically discovers optimal communication paths across NVLink, InfiniBand, and PCIe interconnects. For homogeneous production clusters with stable topologies, manual specification via NCCL_TOPO_FILE provides better performance and determinism. Single-node and inference workloads gain minimal benefit from topology detection due to fixed local connectivity or latency constraints.
Incorrect
NCCL‘s automatic topology detection is most beneficial in heterogeneous or dynamic environments where network configurations vary or change at runtime. It automatically discovers optimal communication paths across NVLink, InfiniBand, and PCIe interconnects. For homogeneous production clusters with stable topologies, manual specification via NCCL_TOPO_FILE provides better performance and determinism. Single-node and inference workloads gain minimal benefit from topology detection due to fixed local connectivity or latency constraints.
Unattempted
NCCL‘s automatic topology detection is most beneficial in heterogeneous or dynamic environments where network configurations vary or change at runtime. It automatically discovers optimal communication paths across NVLink, InfiniBand, and PCIe interconnects. For homogeneous production clusters with stable topologies, manual specification via NCCL_TOPO_FILE provides better performance and determinism. Single-node and inference workloads gain minimal benefit from topology detection due to fixed local connectivity or latency constraints.
Question 16 of 60
16. Question
Your AI infrastructure team needs real-time monitoring of GPU cluster network performance with sub-second latency for anomaly detection. The monitoring system must handle high-frequency updates from 100+ switches without overwhelming the management plane. Which telemetry approach would be MOST effective for this requirement?
Correct
gNMI streaming telemetry with gRPC is the optimal choice for real-time, high-frequency network monitoring in GPU clusters. Unlike SNMP‘s inefficient polling or syslog‘s event-focused approach, gNMI uses efficient push-based streaming with structured data delivery. It supports both on-change and sampled subscriptions, enabling sub-second updates without overwhelming the management plane, making it ideal for large-scale AI infrastructure monitoring requiring immediate anomaly detection.
Incorrect
gNMI streaming telemetry with gRPC is the optimal choice for real-time, high-frequency network monitoring in GPU clusters. Unlike SNMP‘s inefficient polling or syslog‘s event-focused approach, gNMI uses efficient push-based streaming with structured data delivery. It supports both on-change and sampled subscriptions, enabling sub-second updates without overwhelming the management plane, making it ideal for large-scale AI infrastructure monitoring requiring immediate anomaly detection.
Unattempted
gNMI streaming telemetry with gRPC is the optimal choice for real-time, high-frequency network monitoring in GPU clusters. Unlike SNMP‘s inefficient polling or syslog‘s event-focused approach, gNMI uses efficient push-based streaming with structured data delivery. It supports both on-change and sampled subscriptions, enabling sub-second updates without overwhelming the management plane, making it ideal for large-scale AI infrastructure monitoring requiring immediate anomaly detection.
Question 17 of 60
17. Question
Your InfiniBand subnet manager has configured linear forwarding tables (LFTs) for a 648-port director switch connecting compute nodes in a GPU cluster. During high-traffic periods, packets destined for LID 0x4F2 experience routing delays. Which mechanism does the switch use to determine the output port for this destination?
Correct
Linear forwarding tables implement packet routing through direct indexing where the destination LID serves as the array index (LFT[LID]) to retrieve the output port number. This provides O(1) constant-time lookup essential for maintaining InfiniBand‘s low-latency guarantees in GPU clusters. The table is pre-populated by the subnet manager using algorithms like UPDN or FTREE, eliminating per-packet computation overhead.
Incorrect
Linear forwarding tables implement packet routing through direct indexing where the destination LID serves as the array index (LFT[LID]) to retrieve the output port number. This provides O(1) constant-time lookup essential for maintaining InfiniBand‘s low-latency guarantees in GPU clusters. The table is pre-populated by the subnet manager using algorithms like UPDN or FTREE, eliminating per-packet computation overhead.
Unattempted
Linear forwarding tables implement packet routing through direct indexing where the destination LID serves as the array index (LFT[LID]) to retrieve the output port number. This provides O(1) constant-time lookup essential for maintaining InfiniBand‘s low-latency guarantees in GPU clusters. The table is pre-populated by the subnet manager using algorithms like UPDN or FTREE, eliminating per-packet computation overhead.
Question 18 of 60
18. Question
Which statement best describes the relationship between GUIDs and LIDs in InfiniBand addressing?
Correct
InfiniBand uses a two-tier addressing scheme for node identification. GUIDs are permanent 64-bit globally unique identifiers assigned during manufacturing, similar to Ethernet MAC addresses, used for device identification and management. LIDs are temporary 16-bit locally unique addresses dynamically assigned by the subnet manager for efficient packet routing within a subnet. The subnet manager maps GUIDs to LIDs during subnet initialization.
Incorrect
InfiniBand uses a two-tier addressing scheme for node identification. GUIDs are permanent 64-bit globally unique identifiers assigned during manufacturing, similar to Ethernet MAC addresses, used for device identification and management. LIDs are temporary 16-bit locally unique addresses dynamically assigned by the subnet manager for efficient packet routing within a subnet. The subnet manager maps GUIDs to LIDs during subnet initialization.
Unattempted
InfiniBand uses a two-tier addressing scheme for node identification. GUIDs are permanent 64-bit globally unique identifiers assigned during manufacturing, similar to Ethernet MAC addresses, used for device identification and management. LIDs are temporary 16-bit locally unique addresses dynamically assigned by the subnet manager for efficient packet routing within a subnet. The subnet manager maps GUIDs to LIDs during subnet initialization.
Question 19 of 60
19. Question
A 128-node H100 GPU cluster is experiencing 40% degradation in multi-node LLM training performance during all-reduce operations. Network analysis shows individual node-to-node bandwidth at 400 Gbps, but aggregate cross-rack traffic peaks at 12.8 Tbps. What is the primary bottleneck limiting cluster communication capacity?
Correct
Network bisection bandwidth represents the aggregate throughput capacity when half the cluster communicates with the other half, critical for distributed training‘s all-reduce operations. This scenario demonstrates classic bisection bottleneck: individual links perform well (400 Gbps), but spine layer oversubscription (likely 4:1) limits aggregate cross-rack capacity to 12.8 Tbps versus the required 25.6+ Tbps. For 128-node H100 clusters, full bisection bandwidth (1:1 oversubscription) is essential to prevent collective operation slowdowns during multi-node training.
Incorrect
Network bisection bandwidth represents the aggregate throughput capacity when half the cluster communicates with the other half, critical for distributed training‘s all-reduce operations. This scenario demonstrates classic bisection bottleneck: individual links perform well (400 Gbps), but spine layer oversubscription (likely 4:1) limits aggregate cross-rack capacity to 12.8 Tbps versus the required 25.6+ Tbps. For 128-node H100 clusters, full bisection bandwidth (1:1 oversubscription) is essential to prevent collective operation slowdowns during multi-node training.
Unattempted
Network bisection bandwidth represents the aggregate throughput capacity when half the cluster communicates with the other half, critical for distributed training‘s all-reduce operations. This scenario demonstrates classic bisection bottleneck: individual links perform well (400 Gbps), but spine layer oversubscription (likely 4:1) limits aggregate cross-rack capacity to 12.8 Tbps versus the required 25.6+ Tbps. For 128-node H100 clusters, full bisection bandwidth (1:1 oversubscription) is essential to prevent collective operation slowdowns during multi-node training.
Question 20 of 60
20. Question
A data center is deploying a multi-node AI training cluster with 64 H100 GPUs requiring high-bandwidth, low-latency interconnect for distributed training workloads. The infrastructure team needs to select InfiniBand speeds for Quantum switches to support optimal NCCL communication performance. Which InfiniBand speed configuration should they implement?
Correct
NDR 400G InfiniBand with Quantum-2 switches is the optimal choice for H100 multi-node training clusters. It provides 400 Gbps bandwidth matching H100‘s communication requirements, supports sub-microsecond latency for efficient NCCL operations, and enables GPUDirect RDMA for direct GPU memory access across nodes. XDR 800G is unnecessary overprovisioning for current architectures, while HDR 200G creates bandwidth bottlenecks that degrade training performance.
Incorrect
NDR 400G InfiniBand with Quantum-2 switches is the optimal choice for H100 multi-node training clusters. It provides 400 Gbps bandwidth matching H100‘s communication requirements, supports sub-microsecond latency for efficient NCCL operations, and enables GPUDirect RDMA for direct GPU memory access across nodes. XDR 800G is unnecessary overprovisioning for current architectures, while HDR 200G creates bandwidth bottlenecks that degrade training performance.
Unattempted
NDR 400G InfiniBand with Quantum-2 switches is the optimal choice for H100 multi-node training clusters. It provides 400 Gbps bandwidth matching H100‘s communication requirements, supports sub-microsecond latency for efficient NCCL operations, and enables GPUDirect RDMA for direct GPU memory access across nodes. XDR 800G is unnecessary overprovisioning for current architectures, while HDR 200G creates bandwidth bottlenecks that degrade training performance.
Question 21 of 60
21. Question
Your organization is training a 70B parameter LLM across 32 DGX H100 nodes using NCCL for collective operations. Training performance is bottlenecked by AllReduce operations consuming excessive InfiniBand bandwidth. When would implementing SHARP aggregation trees provide the most benefit for this workload?
Correct
SHARP aggregation trees provide maximum benefit during collective operations like AllReduce in distributed training, where all nodes must synchronize data simultaneously. By performing reduction operations within InfiniBand switches rather than at endpoints, SHARP reduces network traffic and latency. This is particularly effective for gradient synchronization in large-scale LLM training where frequent AllReduce operations consume significant bandwidth.
Incorrect
SHARP aggregation trees provide maximum benefit during collective operations like AllReduce in distributed training, where all nodes must synchronize data simultaneously. By performing reduction operations within InfiniBand switches rather than at endpoints, SHARP reduces network traffic and latency. This is particularly effective for gradient synchronization in large-scale LLM training where frequent AllReduce operations consume significant bandwidth.
Unattempted
SHARP aggregation trees provide maximum benefit during collective operations like AllReduce in distributed training, where all nodes must synchronize data simultaneously. By performing reduction operations within InfiniBand switches rather than at endpoints, SHARP reduces network traffic and latency. This is particularly effective for gradient synchronization in large-scale LLM training where frequent AllReduce operations consume significant bandwidth.
Question 22 of 60
22. Question
A datacenter administrator needs to temporarily isolate a faulty switch port in an InfiniBand fabric managed by UFM without physically disconnecting cables. The port must be prevented from participating in subnet management and routing while preserving configuration for future re-enablement. Which UFM operation accomplishes this requirement?
Correct
UFM port disable operation provides administrative control to temporarily isolate problematic ports while maintaining configuration integrity. This is the standard method for controlled port isolation in InfiniBand fabric management, allowing administrators to prevent faulty ports from affecting fabric operations without losing configuration settings. Port re-enablement is achieved through the corresponding enable operation, restoring the port to active service with preserved settings.
Incorrect
UFM port disable operation provides administrative control to temporarily isolate problematic ports while maintaining configuration integrity. This is the standard method for controlled port isolation in InfiniBand fabric management, allowing administrators to prevent faulty ports from affecting fabric operations without losing configuration settings. Port re-enablement is achieved through the corresponding enable operation, restoring the port to active service with preserved settings.
Unattempted
UFM port disable operation provides administrative control to temporarily isolate problematic ports while maintaining configuration integrity. This is the standard method for controlled port isolation in InfiniBand fabric management, allowing administrators to prevent faulty ports from affecting fabric operations without losing configuration settings. Port re-enablement is achieved through the corresponding enable operation, restoring the port to active service with preserved settings.
Question 23 of 60
23. Question
A multi-GPU inference cluster using ConnectX-7 adapters with DPDK integration shows unexpectedly low packet processing throughput (45% of theoretical maximum). The DPDKEthernet ports are configured for GPU-direct data plane acceleration. What is the most likely cause of the performance degradation?
Correct
DPDK poll-mode drivers require adequate hugepage allocation for zero-copy operations and efficient packet buffer management. Insufficient hugepages force DPDK to fall back to interrupt-driven processing, eliminating the core advantages of data plane acceleration. This fallback introduces significant context switching overhead and disrupts the continuous polling model essential for ConnectX GPUDirect integration. The 45% throughput indicates operational mode degradation, not configuration tuning issues. Proper hugepage allocation (typically 1GB pages) is mandatory for production DPDK deployments with GPU-direct acceleration on ConnectX adapters.
Incorrect
DPDK poll-mode drivers require adequate hugepage allocation for zero-copy operations and efficient packet buffer management. Insufficient hugepages force DPDK to fall back to interrupt-driven processing, eliminating the core advantages of data plane acceleration. This fallback introduces significant context switching overhead and disrupts the continuous polling model essential for ConnectX GPUDirect integration. The 45% throughput indicates operational mode degradation, not configuration tuning issues. Proper hugepage allocation (typically 1GB pages) is mandatory for production DPDK deployments with GPU-direct acceleration on ConnectX adapters.
Unattempted
DPDK poll-mode drivers require adequate hugepage allocation for zero-copy operations and efficient packet buffer management. Insufficient hugepages force DPDK to fall back to interrupt-driven processing, eliminating the core advantages of data plane acceleration. This fallback introduces significant context switching overhead and disrupts the continuous polling model essential for ConnectX GPUDirect integration. The 45% throughput indicates operational mode degradation, not configuration tuning issues. Proper hugepage allocation (typically 1GB pages) is mandatory for production DPDK deployments with GPU-direct acceleration on ConnectX adapters.
Question 24 of 60
24. Question
An enterprise is designing an EVPN-VXLAN overlay network for GPU cluster interconnect. The network architect must select the appropriate EVPN route types for optimal MAC/IP advertisement and host reachability. Which combination of EVPN route types is essential for integrating Layer 2 MAC learning with Layer 3 IP routing in this EVPN-VXLAN fabric?
Correct
EVPN-VXLAN integration requires understanding how different EVPN route types work together to provide comprehensive overlay networking. Type 2 routes are fundamental for MAC/IP advertisement, enabling distributed learning without traditional flooding mechanisms. Type 5 routes extend EVPN beyond Layer 2 by advertising IP prefixes for inter-subnet routing, critical for scalable GPU cluster deployments requiring both local Ethernet segment connectivity and routed inter-subnet communication. This combination forms the foundation of EVPN concepts, integrating Layer 2 MAC learning with Layer 3 IP routing capabilities across VXLAN overlay networks.
Incorrect
EVPN-VXLAN integration requires understanding how different EVPN route types work together to provide comprehensive overlay networking. Type 2 routes are fundamental for MAC/IP advertisement, enabling distributed learning without traditional flooding mechanisms. Type 5 routes extend EVPN beyond Layer 2 by advertising IP prefixes for inter-subnet routing, critical for scalable GPU cluster deployments requiring both local Ethernet segment connectivity and routed inter-subnet communication. This combination forms the foundation of EVPN concepts, integrating Layer 2 MAC learning with Layer 3 IP routing capabilities across VXLAN overlay networks.
Unattempted
EVPN-VXLAN integration requires understanding how different EVPN route types work together to provide comprehensive overlay networking. Type 2 routes are fundamental for MAC/IP advertisement, enabling distributed learning without traditional flooding mechanisms. Type 5 routes extend EVPN beyond Layer 2 by advertising IP prefixes for inter-subnet routing, critical for scalable GPU cluster deployments requiring both local Ethernet segment connectivity and routed inter-subnet communication. This combination forms the foundation of EVPN concepts, integrating Layer 2 MAC learning with Layer 3 IP routing capabilities across VXLAN overlay networks.
Question 25 of 60
25. Question
Your data center operates a 256-node AI training cluster with Spectrum-4 switches achieving 51.2 Tbps throughput. Network monitoring shows asymmetric traffic patterns with 70% east-west GPU-to-GPU flows and 30% north-south storage traffic. Which optimization strategy would maximize the 51.2 Tbps switching capacity utilization?
Correct
Maximizing Spectrum-4‘s 51.2 Tbps switching capacity requires adaptive routing with ECMP across all 64x800GbE ports. This leverages the ASIC‘s non-blocking architecture to distribute elephant flows (GPU NCCL traffic) dynamically, preventing hotspots and utilizing full bisection bandwidth. Static oversubscription, misaligned load balancing, or buffer carving based on traffic ratios fail to address the core requirement: efficient path utilization across the entire switching fabric for dominant east-west patterns.
Incorrect
Maximizing Spectrum-4‘s 51.2 Tbps switching capacity requires adaptive routing with ECMP across all 64x800GbE ports. This leverages the ASIC‘s non-blocking architecture to distribute elephant flows (GPU NCCL traffic) dynamically, preventing hotspots and utilizing full bisection bandwidth. Static oversubscription, misaligned load balancing, or buffer carving based on traffic ratios fail to address the core requirement: efficient path utilization across the entire switching fabric for dominant east-west patterns.
Unattempted
Maximizing Spectrum-4‘s 51.2 Tbps switching capacity requires adaptive routing with ECMP across all 64x800GbE ports. This leverages the ASIC‘s non-blocking architecture to distribute elephant flows (GPU NCCL traffic) dynamically, preventing hotspots and utilizing full bisection bandwidth. Static oversubscription, misaligned load balancing, or buffer carving based on traffic ratios fail to address the core requirement: efficient path utilization across the entire switching fabric for dominant east-west patterns.
Question 26 of 60
26. Question
A datacenter engineer needs to configure an InfiniBand switch running Onyx Switch OS to enable port 1/1 with a data rate of 100Gbps and then verify the configuration. Which sequence of Onyx CLI commands accomplishes this task?
Correct
Onyx Switch OS uses hierarchical CLI similar to Cisco IOS for InfiniBand switch management. Configuring InfiniBand ports requires entering configuration mode with ‘configure terminal‘, selecting the specific InfiniBand interface using ‘interface ib X/Y‘ notation, setting the speed with the ‘speed‘ command (where 100 = 100Gbps), and administratively enabling the port with ‘no shutdown‘. Verification uses ‘show interfaces ib‘ commands to display port status and operational state.
Incorrect
Onyx Switch OS uses hierarchical CLI similar to Cisco IOS for InfiniBand switch management. Configuring InfiniBand ports requires entering configuration mode with ‘configure terminal‘, selecting the specific InfiniBand interface using ‘interface ib X/Y‘ notation, setting the speed with the ‘speed‘ command (where 100 = 100Gbps), and administratively enabling the port with ‘no shutdown‘. Verification uses ‘show interfaces ib‘ commands to display port status and operational state.
Unattempted
Onyx Switch OS uses hierarchical CLI similar to Cisco IOS for InfiniBand switch management. Configuring InfiniBand ports requires entering configuration mode with ‘configure terminal‘, selecting the specific InfiniBand interface using ‘interface ib X/Y‘ notation, setting the speed with the ‘speed‘ command (where 100 = 100Gbps), and administratively enabling the port with ‘no shutdown‘. Verification uses ‘show interfaces ib‘ commands to display port status and operational state.
Question 27 of 60
27. Question
A data center operations team needs to monitor GPU server network connectivity and troubleshoot RDMA performance issues across their NVIDIA DGX H100 cluster connected via Ethernet fabric. Which approach achieves comprehensive network monitoring integration?
Correct
NetQ provides purpose-built network monitoring for Ethernet fabrics supporting GPU clusters by deploying agents on both switches and hosts. This enables comprehensive visibility into RDMA/RoCE performance metrics, congestion indicators (PFC, ECN), and end-to-end path analysis critical for troubleshooting multi-GPU distributed training. Unlike generic monitoring tools, NetQ understands GPU networking requirements and provides actionable insights for optimizing GPUDirect RDMA performance across DGX infrastructures.
Incorrect
NetQ provides purpose-built network monitoring for Ethernet fabrics supporting GPU clusters by deploying agents on both switches and hosts. This enables comprehensive visibility into RDMA/RoCE performance metrics, congestion indicators (PFC, ECN), and end-to-end path analysis critical for troubleshooting multi-GPU distributed training. Unlike generic monitoring tools, NetQ understands GPU networking requirements and provides actionable insights for optimizing GPUDirect RDMA performance across DGX infrastructures.
Unattempted
NetQ provides purpose-built network monitoring for Ethernet fabrics supporting GPU clusters by deploying agents on both switches and hosts. This enables comprehensive visibility into RDMA/RoCE performance metrics, congestion indicators (PFC, ECN), and end-to-end path analysis critical for troubleshooting multi-GPU distributed training. Unlike generic monitoring tools, NetQ understands GPU networking requirements and provides actionable insights for optimizing GPUDirect RDMA performance across DGX infrastructures.
Question 28 of 60
28. Question
A multi-node LLM training job using 8x H100 nodes with InfiniBand experiences inconsistent performance. The team sets NCCL_DEBUG=INFO and observes “Using internal network“ instead of “NET/IB“ for inter-node communication. They‘ve verified GPUDirect RDMA is enabled on all nodes. What is the most likely integration issue preventing NCCL from utilizing InfiniBand?
Correct
NCCL environment variables must be properly integrated to configure runtime networking behavior. NCCL_SOCKET_IFNAME is critical for multi-node deployments with multiple network interfaces, as it explicitly directs NCCL to bind to InfiniBand interfaces rather than defaulting to Ethernet. Without this configuration, NCCL‘s automatic detection may select incorrect interfaces, causing fallback to TCP/IP transport (“internal network“). This integration pattern is essential in production clusters where IB and Ethernet coexist, requiring explicit runtime configuration through environment variables to achieve optimal GPUDirect RDMA performance.
Incorrect
NCCL environment variables must be properly integrated to configure runtime networking behavior. NCCL_SOCKET_IFNAME is critical for multi-node deployments with multiple network interfaces, as it explicitly directs NCCL to bind to InfiniBand interfaces rather than defaulting to Ethernet. Without this configuration, NCCL‘s automatic detection may select incorrect interfaces, causing fallback to TCP/IP transport (“internal network“). This integration pattern is essential in production clusters where IB and Ethernet coexist, requiring explicit runtime configuration through environment variables to achieve optimal GPUDirect RDMA performance.
Unattempted
NCCL environment variables must be properly integrated to configure runtime networking behavior. NCCL_SOCKET_IFNAME is critical for multi-node deployments with multiple network interfaces, as it explicitly directs NCCL to bind to InfiniBand interfaces rather than defaulting to Ethernet. Without this configuration, NCCL‘s automatic detection may select incorrect interfaces, causing fallback to TCP/IP transport (“internal network“). This integration pattern is essential in production clusters where IB and Ethernet coexist, requiring explicit runtime configuration through environment variables to achieve optimal GPUDirect RDMA performance.
Question 29 of 60
29. Question
Your AI training cluster uses 16 DGX H100 nodes connected via NDR InfiniBand for multi-node distributed training. You need to configure the InfiniBand switches to optimize NCCL collective operations with GPUDirect RDMA. When would you use IB switch configuration in Onyx Switch OS for this setup?
Correct
IB switch configuration in Onyx Switch OS is critical for optimizing InfiniBand fabrics supporting RDMA traffic in distributed AI training. Proper configuration includes adaptive routing for load balancing, congestion control for lossless operation, and QoS settings to prioritize NCCL collective traffic. This ensures GPUDirect RDMA efficiency across multi-node clusters, preventing packet loss and congestion during synchronized all-reduce operations essential for distributed training scalability.
Incorrect
IB switch configuration in Onyx Switch OS is critical for optimizing InfiniBand fabrics supporting RDMA traffic in distributed AI training. Proper configuration includes adaptive routing for load balancing, congestion control for lossless operation, and QoS settings to prioritize NCCL collective traffic. This ensures GPUDirect RDMA efficiency across multi-node clusters, preventing packet loss and congestion during synchronized all-reduce operations essential for distributed training scalability.
Unattempted
IB switch configuration in Onyx Switch OS is critical for optimizing InfiniBand fabrics supporting RDMA traffic in distributed AI training. Proper configuration includes adaptive routing for load balancing, congestion control for lossless operation, and QoS settings to prioritize NCCL collective traffic. This ensures GPUDirect RDMA efficiency across multi-node clusters, preventing packet loss and congestion during synchronized all-reduce operations essential for distributed training scalability.
Question 30 of 60
30. Question
A datacenter administrator monitors an InfiniBand fabric supporting multi-node H100 training clusters and notices sporadic training slowdowns. UFM Monitoring shows normal link utilization but intermittent packet loss. Which performance counter combination most effectively identifies the root cause of throughput degradation?
Correct
UFM performance counters provide granular visibility into InfiniBand fabric health. PortRcvErrors captures physical layer problems (malformed packets, CRC failures) while PortXmitDiscards identifies buffer exhaustion from congestion. This combination distinguishes between hardware faults requiring cable replacement versus traffic engineering issues needing QoS adjustments, critical for maintaining NCCL collective operation efficiency in multi-node GPU training.
Incorrect
UFM performance counters provide granular visibility into InfiniBand fabric health. PortRcvErrors captures physical layer problems (malformed packets, CRC failures) while PortXmitDiscards identifies buffer exhaustion from congestion. This combination distinguishes between hardware faults requiring cable replacement versus traffic engineering issues needing QoS adjustments, critical for maintaining NCCL collective operation efficiency in multi-node GPU training.
Unattempted
UFM performance counters provide granular visibility into InfiniBand fabric health. PortRcvErrors captures physical layer problems (malformed packets, CRC failures) while PortXmitDiscards identifies buffer exhaustion from congestion. This combination distinguishes between hardware faults requiring cable replacement versus traffic engineering issues needing QoS adjustments, critical for maintaining NCCL collective operation efficiency in multi-node GPU training.
Question 31 of 60
31. Question
An administrator needs to upgrade firmware on 24 NVIDIA Spectrum-X switches running Cumulus Linux in a production AI training cluster. The upgrade must minimize downtime while ensuring GPU training jobs experience no communication disruptions. Which firmware update approach best integrates with the cluster‘s requirements?
Correct
ISSU with rolling upgrades represents optimal firmware update integration for production AI clusters by enabling hitless failover that maintains data plane forwarding during control plane updates. The spine-first, paired-leaf approach ensures routing consistency and redundancy while coordinating with GPU checkpoint intervals minimizes impact on training workloads. This methodology specifically addresses AI infrastructure requirements for zero-disruption RDMA communication, leveraging Cumulus Linux‘s advanced upgrade capabilities designed for high-performance computing environments where GPU utilization and training continuity are critical business requirements.
Incorrect
ISSU with rolling upgrades represents optimal firmware update integration for production AI clusters by enabling hitless failover that maintains data plane forwarding during control plane updates. The spine-first, paired-leaf approach ensures routing consistency and redundancy while coordinating with GPU checkpoint intervals minimizes impact on training workloads. This methodology specifically addresses AI infrastructure requirements for zero-disruption RDMA communication, leveraging Cumulus Linux‘s advanced upgrade capabilities designed for high-performance computing environments where GPU utilization and training continuity are critical business requirements.
Unattempted
ISSU with rolling upgrades represents optimal firmware update integration for production AI clusters by enabling hitless failover that maintains data plane forwarding during control plane updates. The spine-first, paired-leaf approach ensures routing consistency and redundancy while coordinating with GPU checkpoint intervals minimizes impact on training workloads. This methodology specifically addresses AI infrastructure requirements for zero-disruption RDMA communication, leveraging Cumulus Linux‘s advanced upgrade capabilities designed for high-performance computing environments where GPU utilization and training continuity are critical business requirements.
Question 32 of 60
32. Question
A network engineer needs to deploy Cumulus Linux across 50 new switches in a data center fabric. The deployment must support automated configuration, zero-touch provisioning, and integration with existing Ansible workflows. Which technology is best suited for this Cumulus Linux deployment?
Correct
ONIE (Open Network Install Environment) is the standard deployment technology for Cumulus Linux, specifically designed for automated, zero-touch provisioning on bare-metal switches. It eliminates manual installation, automatically discovers provisioning servers, installs Cumulus Linux images, and integrates with configuration management tools like Ansible. For large-scale deployments, ONIE provides the automation, scalability, and consistency required for modern data center fabric installations.
Incorrect
ONIE (Open Network Install Environment) is the standard deployment technology for Cumulus Linux, specifically designed for automated, zero-touch provisioning on bare-metal switches. It eliminates manual installation, automatically discovers provisioning servers, installs Cumulus Linux images, and integrates with configuration management tools like Ansible. For large-scale deployments, ONIE provides the automation, scalability, and consistency required for modern data center fabric installations.
Unattempted
ONIE (Open Network Install Environment) is the standard deployment technology for Cumulus Linux, specifically designed for automated, zero-touch provisioning on bare-metal switches. It eliminates manual installation, automatically discovers provisioning servers, installs Cumulus Linux images, and integrates with configuration management tools like Ansible. For large-scale deployments, ONIE provides the automation, scalability, and consistency required for modern data center fabric installations.
Question 33 of 60
33. Question
A data center architect is deploying an 8-node DGX H100 cluster for multi-node LLM training with NCCL 2.20. The InfiniBand switches running Onyx OS require configuration to support GPUDirect RDMA. What is the critical component that must be enabled on the IB switch configuration to ensure optimal GPU-to-GPU communication across nodes?
Correct
For InfiniBand switches supporting DGX H100 clusters, Sharp (Scalable Hierarchical Aggregation and Reduction Protocol) is the critical enabler for optimal multi-node training. Sharp offloads NCCL collective operations to IB switch hardware, performing in-network aggregation that reduces GPU overhead and minimizes latency. This is essential for efficient gradient synchronization across nodes during distributed LLM training. While features like Adaptive Routing and QoS enhance performance, Sharp is the foundational component that directly accelerates GPUDirect RDMA-based collective communications required by NCCL 2.20 in multi-GPU training workloads.
Incorrect
For InfiniBand switches supporting DGX H100 clusters, Sharp (Scalable Hierarchical Aggregation and Reduction Protocol) is the critical enabler for optimal multi-node training. Sharp offloads NCCL collective operations to IB switch hardware, performing in-network aggregation that reduces GPU overhead and minimizes latency. This is essential for efficient gradient synchronization across nodes during distributed LLM training. While features like Adaptive Routing and QoS enhance performance, Sharp is the foundational component that directly accelerates GPUDirect RDMA-based collective communications required by NCCL 2.20 in multi-GPU training workloads.
Unattempted
For InfiniBand switches supporting DGX H100 clusters, Sharp (Scalable Hierarchical Aggregation and Reduction Protocol) is the critical enabler for optimal multi-node training. Sharp offloads NCCL collective operations to IB switch hardware, performing in-network aggregation that reduces GPU overhead and minimizes latency. This is essential for efficient gradient synchronization across nodes during distributed LLM training. While features like Adaptive Routing and QoS enhance performance, Sharp is the foundational component that directly accelerates GPUDirect RDMA-based collective communications required by NCCL 2.20 in multi-GPU training workloads.
Question 34 of 60
34. Question
During InfiniBand fabric initialization, multiple Subnet Managers are configured for high availability. Which approach ensures proper SM discovery and prevents split-brain scenarios when the fabric powers on?
Correct
SM discovery during fabric initialization follows priority-based master election defined in InfiniBand specifications. Multiple SMs exchange SMINFO attributes to compare priorities, with the highest-priority SM claiming master role and performing initial topology discovery. Standby SMs monitor fabric health and assume master role only upon failure detection, preventing split-brain scenarios and ensuring consistent fabric management throughout initialization.
Incorrect
SM discovery during fabric initialization follows priority-based master election defined in InfiniBand specifications. Multiple SMs exchange SMINFO attributes to compare priorities, with the highest-priority SM claiming master role and performing initial topology discovery. Standby SMs monitor fabric health and assume master role only upon failure detection, preventing split-brain scenarios and ensuring consistent fabric management throughout initialization.
Unattempted
SM discovery during fabric initialization follows priority-based master election defined in InfiniBand specifications. Multiple SMs exchange SMINFO attributes to compare priorities, with the highest-priority SM claiming master role and performing initial topology discovery. Standby SMs monitor fabric health and assume master role only upon failure detection, preventing split-brain scenarios and ensuring consistent fabric management throughout initialization.
Question 35 of 60
35. Question
A datacenter operations team needs to continuously monitor the security posture of their InfiniBand fabric infrastructure, including detecting anomalous traffic patterns, unauthorized access attempts, and configuration vulnerabilities across 200+ switches. Which technology provides comprehensive fabric security monitoring and threat detection capabilities?
Correct
UFM Cyber-AI is NVIDIA‘s specialized solution for InfiniBand fabric security monitoring, providing AI-powered anomaly detection, threat identification, and continuous security posture assessment. Unlike GPU monitoring tools (DCGM), communication libraries (NCCL), or basic subnet managers, UFM Cyber-AI specifically addresses fabric security with comprehensive monitoring, detecting unauthorized access, configuration vulnerabilities, and traffic anomalies across the entire InfiniBand infrastructure.
Incorrect
UFM Cyber-AI is NVIDIA‘s specialized solution for InfiniBand fabric security monitoring, providing AI-powered anomaly detection, threat identification, and continuous security posture assessment. Unlike GPU monitoring tools (DCGM), communication libraries (NCCL), or basic subnet managers, UFM Cyber-AI specifically addresses fabric security with comprehensive monitoring, detecting unauthorized access, configuration vulnerabilities, and traffic anomalies across the entire InfiniBand infrastructure.
Unattempted
UFM Cyber-AI is NVIDIA‘s specialized solution for InfiniBand fabric security monitoring, providing AI-powered anomaly detection, threat identification, and continuous security posture assessment. Unlike GPU monitoring tools (DCGM), communication libraries (NCCL), or basic subnet managers, UFM Cyber-AI specifically addresses fabric security with comprehensive monitoring, detecting unauthorized access, configuration vulnerabilities, and traffic anomalies across the entire InfiniBand infrastructure.
Question 36 of 60
36. Question
What is the Spectrum-4 ASIC in the context of NVIDIA Spectrum switches?
Correct
Spectrum-4 ASIC is NVIDIA‘s fourth-generation Ethernet switch silicon delivering 51.2 Tbps switching capacity. It powers NVIDIA Spectrum-4 switches, providing high-performance network fabric for AI clusters and datacenters with ultra-low latency, advanced telemetry, and support for modern Ethernet standards critical for multi-GPU training and inference workloads.
Incorrect
Spectrum-4 ASIC is NVIDIA‘s fourth-generation Ethernet switch silicon delivering 51.2 Tbps switching capacity. It powers NVIDIA Spectrum-4 switches, providing high-performance network fabric for AI clusters and datacenters with ultra-low latency, advanced telemetry, and support for modern Ethernet standards critical for multi-GPU training and inference workloads.
Unattempted
Spectrum-4 ASIC is NVIDIA‘s fourth-generation Ethernet switch silicon delivering 51.2 Tbps switching capacity. It powers NVIDIA Spectrum-4 switches, providing high-performance network fabric for AI clusters and datacenters with ultra-low latency, advanced telemetry, and support for modern Ethernet standards critical for multi-GPU training and inference workloads.
Question 37 of 60
37. Question
What is the primary purpose of aggregation trees in SHARP (Scalable Hierarchical Aggregation and Reduction Protocol)?
Correct
SHARP aggregation trees accelerate collective operations by offloading AllReduce, broadcast, and other collective primitives to InfiniBand switches. Switches perform partial reductions in-network hierarchically, reducing data movement to host CPUs and GPU overhead. This is critical for distributed training where NCCL collective operations dominate communication patterns, achieving up to 5x faster AllReduce performance on large-scale clusters.
Incorrect
SHARP aggregation trees accelerate collective operations by offloading AllReduce, broadcast, and other collective primitives to InfiniBand switches. Switches perform partial reductions in-network hierarchically, reducing data movement to host CPUs and GPU overhead. This is critical for distributed training where NCCL collective operations dominate communication patterns, achieving up to 5x faster AllReduce performance on large-scale clusters.
Unattempted
SHARP aggregation trees accelerate collective operations by offloading AllReduce, broadcast, and other collective primitives to InfiniBand switches. Switches perform partial reductions in-network hierarchically, reducing data movement to host CPUs and GPU overhead. This is critical for distributed training where NCCL collective operations dominate communication patterns, achieving up to 5x faster AllReduce performance on large-scale clusters.
Question 38 of 60
38. Question
Your team is configuring a multi-node H100 cluster for LLM training with InfiniBand connectivity. During initial testing, you observe suboptimal communication performance between nodes. Which NCCL environment variable should you configure to enable GPUDirect RDMA for direct GPU-to-GPU transfers across nodes?
Correct
NCCL_IB_DISABLE=0 is the correct runtime configuration to enable InfiniBand transport with GPUDirect RDMA for multi-node training. This allows NCCL to bypass CPU memory and perform direct GPU-to-GPU transfers across the InfiniBand network, critical for scaling H100 clusters. Other options either disable optimization features (GDR_LEVEL=0, P2P_DISABLE=1) or force suboptimal transport methods (SOCKET_IFNAME=eth0), reducing the performance benefits of InfiniBand connectivity.
Incorrect
NCCL_IB_DISABLE=0 is the correct runtime configuration to enable InfiniBand transport with GPUDirect RDMA for multi-node training. This allows NCCL to bypass CPU memory and perform direct GPU-to-GPU transfers across the InfiniBand network, critical for scaling H100 clusters. Other options either disable optimization features (GDR_LEVEL=0, P2P_DISABLE=1) or force suboptimal transport methods (SOCKET_IFNAME=eth0), reducing the performance benefits of InfiniBand connectivity.
Unattempted
NCCL_IB_DISABLE=0 is the correct runtime configuration to enable InfiniBand transport with GPUDirect RDMA for multi-node training. This allows NCCL to bypass CPU memory and perform direct GPU-to-GPU transfers across the InfiniBand network, critical for scaling H100 clusters. Other options either disable optimization features (GDR_LEVEL=0, P2P_DISABLE=1) or force suboptimal transport methods (SOCKET_IFNAME=eth0), reducing the performance benefits of InfiniBand connectivity.
Question 39 of 60
39. Question
Your datacenter team needs to identify performance degradation patterns across a 256-GPU H100 cluster over the past 90 days. UFM has been collecting telemetry data, but quarterly reports show inconsistent bandwidth utilization trends. What is the critical component for implementing effective historical analysis with UFM trending and reporting?
Correct
Effective historical analysis in UFM requires systematic data collection through UFM Telemetry with consistent sampling intervals and integration with time-series databases for long-term metric retention. This architecture enables correlation analysis across fabric components, statistical aggregation for trend identification, and scalable storage beyond UFM‘s default limits. Real-time alerting, dashboard refresh rates, and manual CSV exports address different use cases but lack the temporal continuity, query capabilities, and automated aggregation essential for identifying performance degradation patterns over 90-day periods in large-scale InfiniBand fabrics.
Incorrect
Effective historical analysis in UFM requires systematic data collection through UFM Telemetry with consistent sampling intervals and integration with time-series databases for long-term metric retention. This architecture enables correlation analysis across fabric components, statistical aggregation for trend identification, and scalable storage beyond UFM‘s default limits. Real-time alerting, dashboard refresh rates, and manual CSV exports address different use cases but lack the temporal continuity, query capabilities, and automated aggregation essential for identifying performance degradation patterns over 90-day periods in large-scale InfiniBand fabrics.
Unattempted
Effective historical analysis in UFM requires systematic data collection through UFM Telemetry with consistent sampling intervals and integration with time-series databases for long-term metric retention. This architecture enables correlation analysis across fabric components, statistical aggregation for trend identification, and scalable storage beyond UFM‘s default limits. Real-time alerting, dashboard refresh rates, and manual CSV exports address different use cases but lack the temporal continuity, query capabilities, and automated aggregation essential for identifying performance degradation patterns over 90-day periods in large-scale InfiniBand fabrics.
Question 40 of 60
40. Question
Your multi-node AI cluster uses ConnectX-7 adapters with HDR InfiniBand for distributed training. After installation, you observe inconsistent link speeds across nodes, with some ports negotiating at lower rates. Which approach should you use to standardize link speed and mode settings across all ConnectX adapters?
Correct
ConnectX HCA port configuration for link speed and mode requires firmware-level settings using mlxconfig. This tool provides persistent configuration that survives reboots, essential for production clusters. It configures parameters like link speed (HDR 200Gbps, NDR 400Gbps), link type (IB/Ethernet), and port settings. Alternative tools like ibportstate and mlxlink serve diagnostic purposes but don‘t provide persistent firmware configuration needed for cluster standardization.
Incorrect
ConnectX HCA port configuration for link speed and mode requires firmware-level settings using mlxconfig. This tool provides persistent configuration that survives reboots, essential for production clusters. It configures parameters like link speed (HDR 200Gbps, NDR 400Gbps), link type (IB/Ethernet), and port settings. Alternative tools like ibportstate and mlxlink serve diagnostic purposes but don‘t provide persistent firmware configuration needed for cluster standardization.
Unattempted
ConnectX HCA port configuration for link speed and mode requires firmware-level settings using mlxconfig. This tool provides persistent configuration that survives reboots, essential for production clusters. It configures parameters like link speed (HDR 200Gbps, NDR 400Gbps), link type (IB/Ethernet), and port settings. Alternative tools like ibportstate and mlxlink serve diagnostic purposes but don‘t provide persistent firmware configuration needed for cluster standardization.
Question 41 of 60
41. Question
Your InfiniBand fabric management team needs to configure Subnet Manager redundancy across multiple UFM instances to ensure continuous fabric operation during primary SM failures. Which UFM configuration approach provides automatic failover for the Subnet Manager role while maintaining fabric stability?
Correct
UFM High Availability mode is the recommended approach for SM redundancy in production InfiniBand fabrics. It provides automatic failover through priority-based SM election, where standby UFM instances continuously monitor the primary SM‘s health. Upon detecting primary failure, the highest-priority standby seamlessly assumes the active SM role, maintaining fabric stability without manual intervention. This configuration ensures continuous subnet management while avoiding split-brain scenarios through coordinated state transitions.
Incorrect
UFM High Availability mode is the recommended approach for SM redundancy in production InfiniBand fabrics. It provides automatic failover through priority-based SM election, where standby UFM instances continuously monitor the primary SM‘s health. Upon detecting primary failure, the highest-priority standby seamlessly assumes the active SM role, maintaining fabric stability without manual intervention. This configuration ensures continuous subnet management while avoiding split-brain scenarios through coordinated state transitions.
Unattempted
UFM High Availability mode is the recommended approach for SM redundancy in production InfiniBand fabrics. It provides automatic failover through priority-based SM election, where standby UFM instances continuously monitor the primary SM‘s health. Upon detecting primary failure, the highest-priority standby seamlessly assumes the active SM role, maintaining fabric stability without manual intervention. This configuration ensures continuous subnet management while avoiding split-brain scenarios through coordinated state transitions.
Question 42 of 60
42. Question
What is the primary purpose of DPDK integration with ConnectX Ethernet adapters in high-performance networking environments?
Correct
DPDK (Data Plane Development Kit) integration with ConnectX Ethernet adapters enables kernel bypass networking, allowing applications to process packets directly in user space. This eliminates kernel processing overhead, reduces latency significantly, and maximizes packet throughput through polling mode drivers and direct memory access. DPDK is essential for high-performance networking applications requiring accelerated data plane performance, such as network functions virtualization and telecommunications workloads.
Incorrect
DPDK (Data Plane Development Kit) integration with ConnectX Ethernet adapters enables kernel bypass networking, allowing applications to process packets directly in user space. This eliminates kernel processing overhead, reduces latency significantly, and maximizes packet throughput through polling mode drivers and direct memory access. DPDK is essential for high-performance networking applications requiring accelerated data plane performance, such as network functions virtualization and telecommunications workloads.
Unattempted
DPDK (Data Plane Development Kit) integration with ConnectX Ethernet adapters enables kernel bypass networking, allowing applications to process packets directly in user space. This eliminates kernel processing overhead, reduces latency significantly, and maximizes packet throughput through polling mode drivers and direct memory access. DPDK is essential for high-performance networking applications requiring accelerated data plane performance, such as network functions virtualization and telecommunications workloads.
Question 43 of 60
43. Question
A data center operations team needs to track GPU configurations, firmware versions, and deployment locations across 500 DGX H100 systems distributed across multiple sites. When would you use NetQ‘s Inventory management for asset tracking in this scenario?
Correct
NetQ Inventory management is ideal for large-scale distributed infrastructure requiring automated asset tracking, real-time synchronization, and continuous validation. It eliminates manual tracking overhead by automatically discovering hardware configurations, firmware versions, and network topology across 500+ systems. The solution provides immediate visibility into configuration changes, enabling proactive management and compliance validation across multiple data center sites with minimal operational burden.
Incorrect
NetQ Inventory management is ideal for large-scale distributed infrastructure requiring automated asset tracking, real-time synchronization, and continuous validation. It eliminates manual tracking overhead by automatically discovering hardware configurations, firmware versions, and network topology across 500+ systems. The solution provides immediate visibility into configuration changes, enabling proactive management and compliance validation across multiple data center sites with minimal operational burden.
Unattempted
NetQ Inventory management is ideal for large-scale distributed infrastructure requiring automated asset tracking, real-time synchronization, and continuous validation. It eliminates manual tracking overhead by automatically discovering hardware configurations, firmware versions, and network topology across 500+ systems. The solution provides immediate visibility into configuration changes, enabling proactive management and compliance validation across multiple data center sites with minimal operational burden.
Question 44 of 60
44. Question
You are configuring RDMA communication for a multi-node H100 training cluster using InfiniBand HDR. Applications require immediate notification of completed RDMA Write operations to minimize latency in gradient synchronization. Which Completion Queue configuration approach ensures the lowest latency operation completion handling?
Correct
Busy-wait polling in dedicated threads provides minimum latency for RDMA completion handling by eliminating context switches and interrupt delays. For gradient synchronization in distributed training, microsecond-level responsiveness to completed RDMA operations directly impacts training throughput. While consuming CPU resources, this approach ensures immediate detection of Work Completions, making it optimal for latency-critical multi-node communication in H100 clusters using InfiniBand.
Incorrect
Busy-wait polling in dedicated threads provides minimum latency for RDMA completion handling by eliminating context switches and interrupt delays. For gradient synchronization in distributed training, microsecond-level responsiveness to completed RDMA operations directly impacts training throughput. While consuming CPU resources, this approach ensures immediate detection of Work Completions, making it optimal for latency-critical multi-node communication in H100 clusters using InfiniBand.
Unattempted
Busy-wait polling in dedicated threads provides minimum latency for RDMA completion handling by eliminating context switches and interrupt delays. For gradient synchronization in distributed training, microsecond-level responsiveness to completed RDMA operations directly impacts training throughput. While consuming CPU resources, this approach ensures immediate detection of Work Completions, making it optimal for latency-critical multi-node communication in H100 clusters using InfiniBand.
Question 45 of 60
45. Question
A datacenter fabric experiences periodic packet drops on lossless storage traffic despite PFC being enabled. Analysis shows PFC pause frames are being sent, but drops still occur during traffic bursts. Which troubleshooting approach most effectively identifies the flow control issue?
Correct
PFC flow control requires adequate headroom buffers to absorb packets in-flight during pause frame propagation (RTT). Drops despite pause frames indicate buffer overflow before pause takes effect. Headroom sizing must account for link speed, cable latency, and MTU. Typical calculation: headroom = (2 × link_speed × RTT) + MTU, adjusted for burst characteristics.
Incorrect
PFC flow control requires adequate headroom buffers to absorb packets in-flight during pause frame propagation (RTT). Drops despite pause frames indicate buffer overflow before pause takes effect. Headroom sizing must account for link speed, cable latency, and MTU. Typical calculation: headroom = (2 × link_speed × RTT) + MTU, adjusted for burst characteristics.
Unattempted
PFC flow control requires adequate headroom buffers to absorb packets in-flight during pause frame propagation (RTT). Drops despite pause frames indicate buffer overflow before pause takes effect. Headroom sizing must account for link speed, cable latency, and MTU. Typical calculation: headroom = (2 × link_speed × RTT) + MTU, adjusted for burst characteristics.
Question 46 of 60
46. Question
A team is training a 70B parameter LLM across 64 H100 GPUs (8 nodes with 8 GPUs each) using tensor and pipeline parallelism. Training throughput drops significantly when scaling beyond 4 nodes, despite InfiniBand connectivity showing normal bandwidth. GPU utilization drops from 85% to 45% on nodes 5-8. What is the most likely cause?
Correct
Pipeline parallelism scaling issues manifest as bubble overhead where later pipeline stages idle waiting for micro-batches. With 64 GPUs across 8 nodes, excessive pipeline depth creates asymmetric utilization where final stages (nodes 5-8) starve. Solutions include reducing pipeline parallel degree while increasing tensor parallel degree, increasing micro-batch count to fill bubbles, or using interleaved pipeline schedules (1F1B). Network bandwidth being normal rules out communication bottlenecks.
Incorrect
Pipeline parallelism scaling issues manifest as bubble overhead where later pipeline stages idle waiting for micro-batches. With 64 GPUs across 8 nodes, excessive pipeline depth creates asymmetric utilization where final stages (nodes 5-8) starve. Solutions include reducing pipeline parallel degree while increasing tensor parallel degree, increasing micro-batch count to fill bubbles, or using interleaved pipeline schedules (1F1B). Network bandwidth being normal rules out communication bottlenecks.
Unattempted
Pipeline parallelism scaling issues manifest as bubble overhead where later pipeline stages idle waiting for micro-batches. With 64 GPUs across 8 nodes, excessive pipeline depth creates asymmetric utilization where final stages (nodes 5-8) starve. Solutions include reducing pipeline parallel degree while increasing tensor parallel degree, increasing micro-batch count to fill bubbles, or using interleaved pipeline schedules (1F1B). Network bandwidth being normal rules out communication bottlenecks.
Question 47 of 60
47. Question
What is Adaptive Routing (AR) configuration in the context of NVIDIA InfiniBand fabric architecture?
Correct
Adaptive Routing configuration enables InfiniBand fabrics to dynamically select optimal network paths based on real-time congestion monitoring. This is crucial for NVIDIA GPU clusters performing multi-node distributed training, where NCCL communication benefits significantly from reduced latency and avoided hotspots. AR overcomes static routing limitations by responding to actual fabric conditions.
Incorrect
Adaptive Routing configuration enables InfiniBand fabrics to dynamically select optimal network paths based on real-time congestion monitoring. This is crucial for NVIDIA GPU clusters performing multi-node distributed training, where NCCL communication benefits significantly from reduced latency and avoided hotspots. AR overcomes static routing limitations by responding to actual fabric conditions.
Unattempted
Adaptive Routing configuration enables InfiniBand fabrics to dynamically select optimal network paths based on real-time congestion monitoring. This is crucial for NVIDIA GPU clusters performing multi-node distributed training, where NCCL communication benefits significantly from reduced latency and avoided hotspots. AR overcomes static routing limitations by responding to actual fabric conditions.
Question 48 of 60
48. Question
A research team is deploying a multi-node GPU cluster for distributed LLM training. They need to understand how InfiniBand protocol stack organizes communication layers for GPUDirect RDMA operations. Which approach correctly describes the protocol stack organization that enables direct GPU-to-GPU transfers across nodes?
Correct
InfiniBand architecture uses a five-layer protocol stack specifically designed for RDMA operations: Verb layer (user-space API), Transport layer (reliable delivery with queue pairs), Network layer (subnet routing with LIDs), Link layer (flow control), and Physical layer (signaling). This organization enables GPUDirect RDMA by allowing direct memory access between GPUs across nodes without kernel involvement, critical for efficient multi-node distributed training with NCCL.
Incorrect
InfiniBand architecture uses a five-layer protocol stack specifically designed for RDMA operations: Verb layer (user-space API), Transport layer (reliable delivery with queue pairs), Network layer (subnet routing with LIDs), Link layer (flow control), and Physical layer (signaling). This organization enables GPUDirect RDMA by allowing direct memory access between GPUs across nodes without kernel involvement, critical for efficient multi-node distributed training with NCCL.
Unattempted
InfiniBand architecture uses a five-layer protocol stack specifically designed for RDMA operations: Verb layer (user-space API), Transport layer (reliable delivery with queue pairs), Network layer (subnet routing with LIDs), Link layer (flow control), and Physical layer (signaling). This organization enables GPUDirect RDMA by allowing direct memory access between GPUs across nodes without kernel involvement, critical for efficient multi-node distributed training with NCCL.
Question 49 of 60
49. Question
A data center is deploying a 32-node H100 cluster for multi-modal LLM training requiring optimal GPU-to-GPU bandwidth. When configuring rail-optimized topology, which approach ensures maximum bisection bandwidth while minimizing network hops between GPUs?
Correct
Rail-optimized GPU topology dedicates each GPU‘s network interface to separate leaf switches (rails), creating independent communication paths through spine switches. This architecture maximizes bisection bandwidth by eliminating contention between GPU communication streams. Each rail connects to all spine switches, enabling non-blocking GPU-to-GPU communication across the cluster—essential for NCCL all-reduce operations in distributed training.
Incorrect
Rail-optimized GPU topology dedicates each GPU‘s network interface to separate leaf switches (rails), creating independent communication paths through spine switches. This architecture maximizes bisection bandwidth by eliminating contention between GPU communication streams. Each rail connects to all spine switches, enabling non-blocking GPU-to-GPU communication across the cluster—essential for NCCL all-reduce operations in distributed training.
Unattempted
Rail-optimized GPU topology dedicates each GPU‘s network interface to separate leaf switches (rails), creating independent communication paths through spine switches. This architecture maximizes bisection bandwidth by eliminating contention between GPU communication streams. Each rail connects to all spine switches, enabling non-blocking GPU-to-GPU communication across the cluster—essential for NCCL all-reduce operations in distributed training.
Question 50 of 60
50. Question
What is the primary capability of the NVIDIA Spectrum SN5000 series switches for AI infrastructure?
Correct
The NVIDIA Spectrum SN5000 series represents the latest generation of Ethernet switches purpose-built for AI infrastructure, providing 400G and 800G port speeds with ultra-low latency and advanced congestion control. These switches enable efficient multi-node GPU clusters using Ethernet fabrics with RoCE (RDMA over Converged Ethernet) for distributed training workloads on H100/H200 platforms.
Incorrect
The NVIDIA Spectrum SN5000 series represents the latest generation of Ethernet switches purpose-built for AI infrastructure, providing 400G and 800G port speeds with ultra-low latency and advanced congestion control. These switches enable efficient multi-node GPU clusters using Ethernet fabrics with RoCE (RDMA over Converged Ethernet) for distributed training workloads on H100/H200 platforms.
Unattempted
The NVIDIA Spectrum SN5000 series represents the latest generation of Ethernet switches purpose-built for AI infrastructure, providing 400G and 800G port speeds with ultra-low latency and advanced congestion control. These switches enable efficient multi-node GPU clusters using Ethernet fabrics with RoCE (RDMA over Converged Ethernet) for distributed training workloads on H100/H200 platforms.
Question 51 of 60
51. Question
An AI research team is deploying a distributed LLM training workload across 16 H100 nodes using NCCL for gradient synchronization. They observe inconsistent iteration times ranging from 1.2s to 2.8s per step, despite average network latency of 15?s. Which network characteristic most likely explains the training instability?
Correct
Network jitter is the variation in latency over time, critical for synchronous distributed training workloads. In multi-node LLM training, NCCL AllReduce operations require all GPUs to synchronize gradients simultaneously. High jitter means some iterations experience 10?s latency while others see 50?s+, causing unpredictable wait times. This variability compounds across 16 nodes, creating the observed 1.6s variance in iteration times. Low average latency with high jitter is more problematic than slightly higher but consistent latency for AI workloads requiring tight synchronization.
Incorrect
Network jitter is the variation in latency over time, critical for synchronous distributed training workloads. In multi-node LLM training, NCCL AllReduce operations require all GPUs to synchronize gradients simultaneously. High jitter means some iterations experience 10?s latency while others see 50?s+, causing unpredictable wait times. This variability compounds across 16 nodes, creating the observed 1.6s variance in iteration times. Low average latency with high jitter is more problematic than slightly higher but consistent latency for AI workloads requiring tight synchronization.
Unattempted
Network jitter is the variation in latency over time, critical for synchronous distributed training workloads. In multi-node LLM training, NCCL AllReduce operations require all GPUs to synchronize gradients simultaneously. High jitter means some iterations experience 10?s latency while others see 50?s+, causing unpredictable wait times. This variability compounds across 16 nodes, creating the observed 1.6s variance in iteration times. Low average latency with high jitter is more problematic than slightly higher but consistent latency for AI workloads requiring tight synchronization.
Question 52 of 60
52. Question
Your AI training cluster spans multiple subnets across different data center pods, requiring GPU-to-GPU communication for distributed training workloads. The network team can provide either InfiniBand or Ethernet infrastructure. When would you specifically choose RoCEv2 protocol for RDMA over UDP/IP instead of InfiniBand?
Correct
RoCEv2 protocol is specifically designed for RDMA over UDP/IP to enable Layer 3 routing across subnets and IP networks. Its primary advantage over InfiniBand is routable RDMA traffic using standard Ethernet infrastructure, making it ideal for multi-subnet deployments. However, InfiniBand provides superior latency for single-domain clusters, and RoCEv1 is more efficient within Layer 2 boundaries where routing isn‘t required.
Incorrect
RoCEv2 protocol is specifically designed for RDMA over UDP/IP to enable Layer 3 routing across subnets and IP networks. Its primary advantage over InfiniBand is routable RDMA traffic using standard Ethernet infrastructure, making it ideal for multi-subnet deployments. However, InfiniBand provides superior latency for single-domain clusters, and RoCEv1 is more efficient within Layer 2 boundaries where routing isn‘t required.
Unattempted
RoCEv2 protocol is specifically designed for RDMA over UDP/IP to enable Layer 3 routing across subnets and IP networks. Its primary advantage over InfiniBand is routable RDMA traffic using standard Ethernet infrastructure, making it ideal for multi-subnet deployments. However, InfiniBand provides superior latency for single-domain clusters, and RoCEv1 is more efficient within Layer 2 boundaries where routing isn‘t required.
Question 53 of 60
53. Question
A team is deploying a multi-node H100 training cluster and needs to minimize GPU-to-GPU communication latency across nodes while bypassing CPU overhead. Which technology best addresses both physical layer connectivity and data link layer communication for this infrastructure?
Correct
InfiniBand HDR with GPUDirect RDMA is the optimal solution for multi-node GPU clusters. InfiniBand operates at OSI Layer 1 (physical signaling at 200 Gbps) and Layer 2 (reliable data link protocol), while GPUDirect RDMA enables direct GPU memory access without CPU intervention. This combination minimizes latency and maximizes bandwidth for distributed training. NVLink is intra-node only, PCIe doesn‘t span nodes, and RoCE has higher latency than native InfiniBand for AI workloads.
Incorrect
InfiniBand HDR with GPUDirect RDMA is the optimal solution for multi-node GPU clusters. InfiniBand operates at OSI Layer 1 (physical signaling at 200 Gbps) and Layer 2 (reliable data link protocol), while GPUDirect RDMA enables direct GPU memory access without CPU intervention. This combination minimizes latency and maximizes bandwidth for distributed training. NVLink is intra-node only, PCIe doesn‘t span nodes, and RoCE has higher latency than native InfiniBand for AI workloads.
Unattempted
InfiniBand HDR with GPUDirect RDMA is the optimal solution for multi-node GPU clusters. InfiniBand operates at OSI Layer 1 (physical signaling at 200 Gbps) and Layer 2 (reliable data link protocol), while GPUDirect RDMA enables direct GPU memory access without CPU intervention. This combination minimizes latency and maximizes bandwidth for distributed training. NVLink is intra-node only, PCIe doesn‘t span nodes, and RoCE has higher latency than native InfiniBand for AI workloads.
Question 54 of 60
54. Question
What is the primary purpose of Telemetry collection in UFM (Unified Fabric Manager) for streaming fabric data?
Correct
UFM Telemetry collection continuously streams real-time performance metrics, error counters, and health data from InfiniBand fabric components. This includes bandwidth utilization, packet rates, congestion indicators, temperature readings, and link quality metrics. The collected data enables administrators to monitor fabric health, detect anomalies, optimize performance, and troubleshoot issues across GPU clusters using InfiniBand networking for multi-node AI workloads.
Incorrect
UFM Telemetry collection continuously streams real-time performance metrics, error counters, and health data from InfiniBand fabric components. This includes bandwidth utilization, packet rates, congestion indicators, temperature readings, and link quality metrics. The collected data enables administrators to monitor fabric health, detect anomalies, optimize performance, and troubleshoot issues across GPU clusters using InfiniBand networking for multi-node AI workloads.
Unattempted
UFM Telemetry collection continuously streams real-time performance metrics, error counters, and health data from InfiniBand fabric components. This includes bandwidth utilization, packet rates, congestion indicators, temperature readings, and link quality metrics. The collected data enables administrators to monitor fabric health, detect anomalies, optimize performance, and troubleshoot issues across GPU clusters using InfiniBand networking for multi-node AI workloads.
Question 55 of 60
55. Question
Your data center uses EVPN-VXLAN with BGP for overlay networking across 32 spine-leaf switches. After a recent configuration change, some VTEPs cannot establish BGP sessions. You need to validate BGP/EVPN state across all devices using NetQ. Which protocol validation approach provides the most comprehensive verification of BGP neighbor relationships and EVPN route advertisement status?
Correct
Protocol validation in NetQ requires using purpose-built validation commands rather than query or monitoring tools. For BGP/EVPN verification, ‘netq check bgp‘ validates BGP underlay health (sessions, neighbors, peering states) while ‘netq check evpn‘ validates overlay health (VNI consistency, route types, VXLAN tunnels). These commands perform automated checks against expected protocol states and configuration consistency across the fabric. Other approaches like ‘netq show‘, ‘netq trace‘, or monitoring focus on data visibility or reachability testing but lack the comprehensive validation logic needed to identify protocol misconfigurations and inconsistencies in EVPN-VXLAN environments.
Incorrect
Protocol validation in NetQ requires using purpose-built validation commands rather than query or monitoring tools. For BGP/EVPN verification, ‘netq check bgp‘ validates BGP underlay health (sessions, neighbors, peering states) while ‘netq check evpn‘ validates overlay health (VNI consistency, route types, VXLAN tunnels). These commands perform automated checks against expected protocol states and configuration consistency across the fabric. Other approaches like ‘netq show‘, ‘netq trace‘, or monitoring focus on data visibility or reachability testing but lack the comprehensive validation logic needed to identify protocol misconfigurations and inconsistencies in EVPN-VXLAN environments.
Unattempted
Protocol validation in NetQ requires using purpose-built validation commands rather than query or monitoring tools. For BGP/EVPN verification, ‘netq check bgp‘ validates BGP underlay health (sessions, neighbors, peering states) while ‘netq check evpn‘ validates overlay health (VNI consistency, route types, VXLAN tunnels). These commands perform automated checks against expected protocol states and configuration consistency across the fabric. Other approaches like ‘netq show‘, ‘netq trace‘, or monitoring focus on data visibility or reachability testing but lack the comprehensive validation logic needed to identify protocol misconfigurations and inconsistencies in EVPN-VXLAN environments.
Question 56 of 60
56. Question
A network team is planning to upgrade NIC firmware across 200 GPU servers in their AI training cluster. They need to verify network performance before and after the upgrade. When would you use NetQ Change validation for pre/post change verification?
Correct
NetQ Change validation captures network state snapshots before planned changes, then compares post-change validation results against the baseline. This pre/post methodology identifies configuration drift, unexpected changes, or performance degradation caused by firmware upgrades, enabling quick rollback decisions. The comparison approach differentiates intentional changes from unintended side effects across distributed infrastructure.
Incorrect
NetQ Change validation captures network state snapshots before planned changes, then compares post-change validation results against the baseline. This pre/post methodology identifies configuration drift, unexpected changes, or performance degradation caused by firmware upgrades, enabling quick rollback decisions. The comparison approach differentiates intentional changes from unintended side effects across distributed infrastructure.
Unattempted
NetQ Change validation captures network state snapshots before planned changes, then compares post-change validation results against the baseline. This pre/post methodology identifies configuration drift, unexpected changes, or performance degradation caused by firmware upgrades, enabling quick rollback decisions. The comparison approach differentiates intentional changes from unintended side effects across distributed infrastructure.
Question 57 of 60
57. Question
Your multi-node H100 cluster experiences intermittent packet loss during distributed LLM training over RoCE v2 fabric, causing NCCL timeouts. Network monitoring shows buffer overflows during AllReduce operations. Which DCQCN parameter adjustment would most effectively address this congestion control issue?
Correct
DCQCN tuning for RoCE fabrics requires balancing congestion response with throughput. Buffer overflows during NCCL operations indicate insufficient rate reduction before saturation. Decreasing ECN thresholds enables early congestion detection, while increasing alpha parameter strengthens rate reduction response. This prevents deep buffer occupancy that causes packet drops and training disruptions. Conservative DCQCN tuning is critical for synchronized collective communication patterns in distributed AI workloads.
Incorrect
DCQCN tuning for RoCE fabrics requires balancing congestion response with throughput. Buffer overflows during NCCL operations indicate insufficient rate reduction before saturation. Decreasing ECN thresholds enables early congestion detection, while increasing alpha parameter strengthens rate reduction response. This prevents deep buffer occupancy that causes packet drops and training disruptions. Conservative DCQCN tuning is critical for synchronized collective communication patterns in distributed AI workloads.
Unattempted
DCQCN tuning for RoCE fabrics requires balancing congestion response with throughput. Buffer overflows during NCCL operations indicate insufficient rate reduction before saturation. Decreasing ECN thresholds enables early congestion detection, while increasing alpha parameter strengthens rate reduction response. This prevents deep buffer occupancy that causes packet drops and training disruptions. Conservative DCQCN tuning is critical for synchronized collective communication patterns in distributed AI workloads.
Question 58 of 60
58. Question
A team is configuring data parallelism for distributed training of a 70B parameter LLM across 16 H100 GPUs spanning 4 nodes. Which network configuration is MOST critical for efficient data parallel gradient synchronization?
Correct
Data parallel training synchronizes gradients across all GPUs after each backward pass, making inter-node network bandwidth and latency critical. InfiniBand HDR with GPUDirect RDMA provides optimal performance by enabling direct GPU-to-GPU communication, bypassing CPU overhead. NCCL, the standard for NVIDIA GPU collective operations, maximizes efficiency with InfiniBand‘s RDMA capabilities, essential for 4-node distributed training where gradient volumes are substantial.
Incorrect
Data parallel training synchronizes gradients across all GPUs after each backward pass, making inter-node network bandwidth and latency critical. InfiniBand HDR with GPUDirect RDMA provides optimal performance by enabling direct GPU-to-GPU communication, bypassing CPU overhead. NCCL, the standard for NVIDIA GPU collective operations, maximizes efficiency with InfiniBand‘s RDMA capabilities, essential for 4-node distributed training where gradient volumes are substantial.
Unattempted
Data parallel training synchronizes gradients across all GPUs after each backward pass, making inter-node network bandwidth and latency critical. InfiniBand HDR with GPUDirect RDMA provides optimal performance by enabling direct GPU-to-GPU communication, bypassing CPU overhead. NCCL, the standard for NVIDIA GPU collective operations, maximizes efficiency with InfiniBand‘s RDMA capabilities, essential for 4-node distributed training where gradient volumes are substantial.
Question 59 of 60
59. Question
Your datacenter runs distributed AI training workloads with 8-node H100 clusters using NCCL over RoCE Ethernet. Traffic patterns show periodic all-reduce bursts causing packet loss during synchronization phases. When would deep buffer Spectrum switches be most beneficial for this environment?
Correct
Deep buffer Spectrum switches are purpose-built for handling bursty, synchronized traffic patterns common in distributed AI training. During NCCL all-reduce operations, multiple nodes simultaneously transmit gradient updates, creating micro-bursts that can overwhelm standard switch buffers. Deep buffers (several MB capacity) absorb these bursts without packet loss, maintaining high throughput and low tail latency. This is particularly critical for RoCE environments where burst absorption prevents the need for aggressive PFC or ECN mechanisms that can introduce performance penalties.
Incorrect
Deep buffer Spectrum switches are purpose-built for handling bursty, synchronized traffic patterns common in distributed AI training. During NCCL all-reduce operations, multiple nodes simultaneously transmit gradient updates, creating micro-bursts that can overwhelm standard switch buffers. Deep buffers (several MB capacity) absorb these bursts without packet loss, maintaining high throughput and low tail latency. This is particularly critical for RoCE environments where burst absorption prevents the need for aggressive PFC or ECN mechanisms that can introduce performance penalties.
Unattempted
Deep buffer Spectrum switches are purpose-built for handling bursty, synchronized traffic patterns common in distributed AI training. During NCCL all-reduce operations, multiple nodes simultaneously transmit gradient updates, creating micro-bursts that can overwhelm standard switch buffers. Deep buffers (several MB capacity) absorb these bursts without packet loss, maintaining high throughput and low tail latency. This is particularly critical for RoCE environments where burst absorption prevents the need for aggressive PFC or ECN mechanisms that can introduce performance penalties.
Question 60 of 60
60. Question
An administrator configures bond0 on a Cumulus Linux switch with interfaces swp1-swp4, but the bond fails to pass traffic after reboot. The configuration shows ‘bond-slaves glob swp[1-4]‘ in /etc/network/interfaces, but ‘ip link show bond0‘ displays ‘NO-CARRIER‘. What is the most likely cause?
Correct
In Cumulus Linux interface integration, bond member interfaces require explicit ‘auto‘ declarations in /etc/network/interfaces to initialize at boot time. The configuration shows correct bond-slaves glob syntax, but without ‘auto swp1‘, ‘auto swp2‘, etc., these physical interfaces never come up administratively, preventing them from joining bond0. This causes the persistent NO-CARRIER state since the bond has no active members. Proper configuration requires both the bond definition with members and individual auto stanzas for each member interface to ensure initialization order and successful aggregation.
Incorrect
In Cumulus Linux interface integration, bond member interfaces require explicit ‘auto‘ declarations in /etc/network/interfaces to initialize at boot time. The configuration shows correct bond-slaves glob syntax, but without ‘auto swp1‘, ‘auto swp2‘, etc., these physical interfaces never come up administratively, preventing them from joining bond0. This causes the persistent NO-CARRIER state since the bond has no active members. Proper configuration requires both the bond definition with members and individual auto stanzas for each member interface to ensure initialization order and successful aggregation.
Unattempted
In Cumulus Linux interface integration, bond member interfaces require explicit ‘auto‘ declarations in /etc/network/interfaces to initialize at boot time. The configuration shows correct bond-slaves glob syntax, but without ‘auto swp1‘, ‘auto swp2‘, etc., these physical interfaces never come up administratively, preventing them from joining bond0. This causes the persistent NO-CARRIER state since the bond has no active members. Proper configuration requires both the bond definition with members and individual auto stanzas for each member interface to ensure initialization order and successful aggregation.
X
Use Page numbers below to navigate to other practice tests