You have already completed the Test before. Hence you can not start it again.
Test is loading...
You must sign in or sign up to start the Test.
You have to finish following quiz, to start this Test:
Your results are here!! for" NVIDIA NCP-AIN Practice Test 6 "
0 of 60 questions answered correctly
Your time:
Time has elapsed
Your Final Score is : 0
You have attempted : 0
Number of Correct Questions : 0 and scored 0
Number of Incorrect Questions : 0 and Negative marks 0
Average score
Your score
NVIDIA NCP-AIN
You have attempted: 0
Number of Correct Questions: 0 and scored 0
Number of Incorrect Questions: 0 and Negative marks 0
You can review your answers by clicking on “View Answers” option. Important Note : Open Reference Documentation Links in New Tab (Right Click and Open in New Tab).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Answered
Review
Question 1 of 60
1. Question
A distributed training cluster using 16 H100 GPUs across 4 DGX nodes requires direct GPU memory access for gradient synchronization without involving remote CPUs. In an RDMA over InfiniBand implementation, which approach achieves one-sided communication operations for this workload?
Correct
RDMA Write operations enable one-sided communication by allowing the source node to directly write data into remote GPU memory without remote CPU involvement. This is critical for GPUDirect RDMA in multi-node training clusters, where NCCL leverages RDMA Write for efficient all-reduce operations. Unlike two-sided Send/Receive verbs requiring receiver participation, RDMA Write minimizes latency and CPU overhead, making it optimal for gradient synchronization workloads.
Incorrect
RDMA Write operations enable one-sided communication by allowing the source node to directly write data into remote GPU memory without remote CPU involvement. This is critical for GPUDirect RDMA in multi-node training clusters, where NCCL leverages RDMA Write for efficient all-reduce operations. Unlike two-sided Send/Receive verbs requiring receiver participation, RDMA Write minimizes latency and CPU overhead, making it optimal for gradient synchronization workloads.
Unattempted
RDMA Write operations enable one-sided communication by allowing the source node to directly write data into remote GPU memory without remote CPU involvement. This is critical for GPUDirect RDMA in multi-node training clusters, where NCCL leverages RDMA Write for efficient all-reduce operations. Unlike two-sided Send/Receive verbs requiring receiver participation, RDMA Write minimizes latency and CPU overhead, making it optimal for gradient synchronization workloads.
Question 2 of 60
2. Question
During InfiniBand fabric initialization, a new compute node with dual-port HCA connects to the network but fails to establish communication with other nodes. Subnet management traffic shows multiple SM discovery packets but no assigned LID. When would SM discovery help diagnose this fabric initialization failure?
Correct
SM discovery is the fundamental process during InfiniBand fabric initialization where HCA ports identify and establish communication with active Subnet Managers to receive Local Identifier (LID) assignments. Without successful SM discovery, nodes cannot participate in the fabric. This diagnostic technique specifically addresses management plane connectivity issues during fabric bring-up, distinct from data plane optimizations or GPU-specific technologies that require an already-initialized fabric.
Incorrect
SM discovery is the fundamental process during InfiniBand fabric initialization where HCA ports identify and establish communication with active Subnet Managers to receive Local Identifier (LID) assignments. Without successful SM discovery, nodes cannot participate in the fabric. This diagnostic technique specifically addresses management plane connectivity issues during fabric bring-up, distinct from data plane optimizations or GPU-specific technologies that require an already-initialized fabric.
Unattempted
SM discovery is the fundamental process during InfiniBand fabric initialization where HCA ports identify and establish communication with active Subnet Managers to receive Local Identifier (LID) assignments. Without successful SM discovery, nodes cannot participate in the fabric. This diagnostic technique specifically addresses management plane connectivity issues during fabric bring-up, distinct from data plane optimizations or GPU-specific technologies that require an already-initialized fabric.
Question 3 of 60
3. Question
A network administrator notices that the UFM fabric map displays inconsistent topology views across different monitoring sessions, with some switches and cables appearing intermittently. The physical infrastructure has not changed. What is the critical component that must be verified to ensure accurate and consistent fabric map visualization?
Correct
UFM fabric map accuracy depends critically on proper SNMP polling configuration. Inconsistent topology visualization typically results from polling intervals that are mismatched with device response characteristics, causing timeouts and incomplete discovery cycles. UFM must successfully query all network elements during each discovery iteration to maintain consistent fabric views. When polling is too aggressive or network latency varies, some devices fail to respond within timeout windows, creating intermittent visibility gaps. Proper SNMP tuning ensures complete topology data collection across all monitoring sessions, eliminating visualization inconsistencies.
Incorrect
UFM fabric map accuracy depends critically on proper SNMP polling configuration. Inconsistent topology visualization typically results from polling intervals that are mismatched with device response characteristics, causing timeouts and incomplete discovery cycles. UFM must successfully query all network elements during each discovery iteration to maintain consistent fabric views. When polling is too aggressive or network latency varies, some devices fail to respond within timeout windows, creating intermittent visibility gaps. Proper SNMP tuning ensures complete topology data collection across all monitoring sessions, eliminating visualization inconsistencies.
Unattempted
UFM fabric map accuracy depends critically on proper SNMP polling configuration. Inconsistent topology visualization typically results from polling intervals that are mismatched with device response characteristics, causing timeouts and incomplete discovery cycles. UFM must successfully query all network elements during each discovery iteration to maintain consistent fabric views. When polling is too aggressive or network latency varies, some devices fail to respond within timeout windows, creating intermittent visibility gaps. Proper SNMP tuning ensures complete topology data collection across all monitoring sessions, eliminating visualization inconsistencies.
Question 4 of 60
4. Question
What is a Completion Queue (CQ) in the context of RDMA over InfiniBand?
Correct
Completion Queues (CQs) are fundamental RDMA structures that receive Work Completion entries when operations complete on associated Queue Pairs. Applications poll or wait on CQs to determine operation status (success/failure), enabling efficient asynchronous processing. Multiple Queue Pairs can share a single CQ, providing flexible resource management and scalable completion handling in high-performance RDMA environments.
Incorrect
Completion Queues (CQs) are fundamental RDMA structures that receive Work Completion entries when operations complete on associated Queue Pairs. Applications poll or wait on CQs to determine operation status (success/failure), enabling efficient asynchronous processing. Multiple Queue Pairs can share a single CQ, providing flexible resource management and scalable completion handling in high-performance RDMA environments.
Unattempted
Completion Queues (CQs) are fundamental RDMA structures that receive Work Completion entries when operations complete on associated Queue Pairs. Applications poll or wait on CQs to determine operation status (success/failure), enabling efficient asynchronous processing. Multiple Queue Pairs can share a single CQ, providing flexible resource management and scalable completion handling in high-performance RDMA environments.
Question 5 of 60
5. Question
Your team is implementing distributed training for a 70B parameter LLM across 16 H100 GPUs using tensor parallelism. During backward propagation, gradient shards need to be distributed to all GPUs before the optimizer step. Which NCCL collective pattern should you configure to efficiently synchronize these gradient shards across all ranks?
Correct
AllGather is the optimal collective pattern for distributing gradient shards in tensor parallel training. During backward propagation, each GPU computes gradients for its tensor shard. AllGather efficiently collects these shards from all ranks and distributes complete gradient tensors to every GPU, enabling synchronized optimizer updates. NCCL 2.20+ implements AllGather with ring or tree algorithms optimized for NVLink 4.0 topology on H100 clusters, achieving higher bandwidth than sequential gather-broadcast approaches.
Incorrect
AllGather is the optimal collective pattern for distributing gradient shards in tensor parallel training. During backward propagation, each GPU computes gradients for its tensor shard. AllGather efficiently collects these shards from all ranks and distributes complete gradient tensors to every GPU, enabling synchronized optimizer updates. NCCL 2.20+ implements AllGather with ring or tree algorithms optimized for NVLink 4.0 topology on H100 clusters, achieving higher bandwidth than sequential gather-broadcast approaches.
Unattempted
AllGather is the optimal collective pattern for distributing gradient shards in tensor parallel training. During backward propagation, each GPU computes gradients for its tensor shard. AllGather efficiently collects these shards from all ranks and distributes complete gradient tensors to every GPU, enabling synchronized optimizer updates. NCCL 2.20+ implements AllGather with ring or tree algorithms optimized for NVLink 4.0 topology on H100 clusters, achieving higher bandwidth than sequential gather-broadcast approaches.
Question 6 of 60
6. Question
An AI infrastructure team is designing a GPU cluster with 128 H100 GPUs across 16 nodes for multi-tenant LLM inference workloads. They need to optimize GPU-to-GPU communication while maintaining fault isolation between tenant workloads. Which rail-optimized topology configuration best achieves low-latency communication with workload isolation?
Correct
Rail-optimized design for GPU clusters requires balancing bandwidth, fault isolation, and scalability. Dual-rail topology with dedicated InfiniBand fabrics per rail provides optimal multi-tenant isolation while NCCL automatically load-balances traffic across rails, delivering 2x aggregate bandwidth. This configuration eliminates single points of failure, prevents cross-tenant interference, and scales efficiently for large LLM inference deployments requiring consistent low-latency GPU-to-GPU communication across multiple nodes.
Incorrect
Rail-optimized design for GPU clusters requires balancing bandwidth, fault isolation, and scalability. Dual-rail topology with dedicated InfiniBand fabrics per rail provides optimal multi-tenant isolation while NCCL automatically load-balances traffic across rails, delivering 2x aggregate bandwidth. This configuration eliminates single points of failure, prevents cross-tenant interference, and scales efficiently for large LLM inference deployments requiring consistent low-latency GPU-to-GPU communication across multiple nodes.
Unattempted
Rail-optimized design for GPU clusters requires balancing bandwidth, fault isolation, and scalability. Dual-rail topology with dedicated InfiniBand fabrics per rail provides optimal multi-tenant isolation while NCCL automatically load-balances traffic across rails, delivering 2x aggregate bandwidth. This configuration eliminates single points of failure, prevents cross-tenant interference, and scales efficiently for large LLM inference deployments requiring consistent low-latency GPU-to-GPU communication across multiple nodes.
Question 7 of 60
7. Question
What is the primary purpose of the perfquery tool in InfiniBand fabric troubleshooting?
Correct
perfquery is the standard InfiniBand diagnostic utility for querying port performance counters including transmitted/received packets, errors, and discards. It provides critical visibility into fabric health by exposing metrics like PortXmitData, PortRcvErrors, and SymbolErrorCounter, enabling administrators to identify bottlenecks, errors, and performance degradation. This read-only tool is essential for troubleshooting but does not perform configuration or traffic management functions.
Incorrect
perfquery is the standard InfiniBand diagnostic utility for querying port performance counters including transmitted/received packets, errors, and discards. It provides critical visibility into fabric health by exposing metrics like PortXmitData, PortRcvErrors, and SymbolErrorCounter, enabling administrators to identify bottlenecks, errors, and performance degradation. This read-only tool is essential for troubleshooting but does not perform configuration or traffic management functions.
Unattempted
perfquery is the standard InfiniBand diagnostic utility for querying port performance counters including transmitted/received packets, errors, and discards. It provides critical visibility into fabric health by exposing metrics like PortXmitData, PortRcvErrors, and SymbolErrorCounter, enabling administrators to identify bottlenecks, errors, and performance degradation. This read-only tool is essential for troubleshooting but does not perform configuration or traffic management functions.
Question 8 of 60
8. Question
You are configuring an 8-node DGX H100 cluster with InfiniBand HDR networking for distributed LLM training. The cluster uses a fat-tree topology with multiple paths between leaf and spine switches. When would fat-tree routing with topology-aware path selection provide the most benefit for NCCL all-reduce operations?
Correct
Fat-tree routing with topology-aware path selection maximizes benefits during multi-node distributed training with high communication concurrency. When NCCL performs frequent all-reduce operations with small batches, multiple concurrent messages traverse the fabric simultaneously. Topology-aware selection distributes traffic across multiple equal-cost paths between leaf and spine switches, preventing bottlenecks and utilizing aggregate fabric bandwidth. Single-node operations, local storage I/O, and inference workloads lack the concurrent inter-node collective communication patterns that justify fat-tree optimization.
Incorrect
Fat-tree routing with topology-aware path selection maximizes benefits during multi-node distributed training with high communication concurrency. When NCCL performs frequent all-reduce operations with small batches, multiple concurrent messages traverse the fabric simultaneously. Topology-aware selection distributes traffic across multiple equal-cost paths between leaf and spine switches, preventing bottlenecks and utilizing aggregate fabric bandwidth. Single-node operations, local storage I/O, and inference workloads lack the concurrent inter-node collective communication patterns that justify fat-tree optimization.
Unattempted
Fat-tree routing with topology-aware path selection maximizes benefits during multi-node distributed training with high communication concurrency. When NCCL performs frequent all-reduce operations with small batches, multiple concurrent messages traverse the fabric simultaneously. Topology-aware selection distributes traffic across multiple equal-cost paths between leaf and spine switches, preventing bottlenecks and utilizing aggregate fabric bandwidth. Single-node operations, local storage I/O, and inference workloads lack the concurrent inter-node collective communication patterns that justify fat-tree optimization.
Question 9 of 60
9. Question
A multi-node H100 cluster experiences intermittent packet drops during large-scale LLM training over RoCE fabric. Network monitoring shows ECN-marked packets are being dropped instead of triggering congestion control. What is the most likely configuration issue?
Correct
Successful RoCE deployment for NCCL-based distributed training requires end-to-end ECN configuration. Switches must mark packets when congestion is detected (CE bit in IP header), and ConnectX NICs must have DCQCN (Data Center Quantized Congestion Notification) enabled to respond to these marks by reducing transmission rates. A common misconfiguration is enabling ECN on switches while leaving it disabled on endpoints, causing marked packets to be ignored until buffer overflow occurs, resulting in packet loss that severely impacts training throughput and convergence.
Incorrect
Successful RoCE deployment for NCCL-based distributed training requires end-to-end ECN configuration. Switches must mark packets when congestion is detected (CE bit in IP header), and ConnectX NICs must have DCQCN (Data Center Quantized Congestion Notification) enabled to respond to these marks by reducing transmission rates. A common misconfiguration is enabling ECN on switches while leaving it disabled on endpoints, causing marked packets to be ignored until buffer overflow occurs, resulting in packet loss that severely impacts training throughput and convergence.
Unattempted
Successful RoCE deployment for NCCL-based distributed training requires end-to-end ECN configuration. Switches must mark packets when congestion is detected (CE bit in IP header), and ConnectX NICs must have DCQCN (Data Center Quantized Congestion Notification) enabled to respond to these marks by reducing transmission rates. A common misconfiguration is enabling ECN on switches while leaving it disabled on endpoints, causing marked packets to be ignored until buffer overflow occurs, resulting in packet loss that severely impacts training throughput and convergence.
Question 10 of 60
10. Question
An AI cluster with 64 H100 GPUs across 8 DGX nodes requires full bisection bandwidth for distributed LLM training to avoid communication bottlenecks during gradient synchronization. Which networking technology best achieves this requirement for multi-node GPU-to-GPU communication?
Correct
Full bisection bandwidth in AI clusters requires non-blocking network fabrics where any node can communicate with any other at full line rate simultaneously. InfiniBand NDR (400 Gbps) in fat-tree topology provides this capability for multi-node GPU clusters, essential for distributed training workloads with frequent all-reduce operations. GPUDirect RDMA eliminates CPU bottlenecks by enabling direct GPU memory access across nodes, maximizing NCCL efficiency for gradient synchronization during LLM training.
Incorrect
Full bisection bandwidth in AI clusters requires non-blocking network fabrics where any node can communicate with any other at full line rate simultaneously. InfiniBand NDR (400 Gbps) in fat-tree topology provides this capability for multi-node GPU clusters, essential for distributed training workloads with frequent all-reduce operations. GPUDirect RDMA eliminates CPU bottlenecks by enabling direct GPU memory access across nodes, maximizing NCCL efficiency for gradient synchronization during LLM training.
Unattempted
Full bisection bandwidth in AI clusters requires non-blocking network fabrics where any node can communicate with any other at full line rate simultaneously. InfiniBand NDR (400 Gbps) in fat-tree topology provides this capability for multi-node GPU clusters, essential for distributed training workloads with frequent all-reduce operations. GPUDirect RDMA eliminates CPU bottlenecks by enabling direct GPU memory access across nodes, maximizing NCCL efficiency for gradient synchronization during LLM training.
Question 11 of 60
11. Question
After restoring a network switch configuration from backup, the administrator notices that VLAN configurations persist but SNMP community strings and user passwords revert to defaults. The backup was created using the switch‘s built-in backup utility with default settings. What is the most likely cause of this incomplete configuration restoration?
Correct
Switch backup utilities implement security-by-design principles, excluding sensitive credentials (passwords, SNMP strings, certificates) from default backups to prevent unauthorized access if backup files are compromised. Administrators must explicitly enable credential backup using specific flags or configuration options. This behavior explains why network configurations (VLANs, routing) restore successfully while security parameters revert to defaults. Best practice requires documenting credential backup requirements and using secure storage for backup files containing sensitive data.
Incorrect
Switch backup utilities implement security-by-design principles, excluding sensitive credentials (passwords, SNMP strings, certificates) from default backups to prevent unauthorized access if backup files are compromised. Administrators must explicitly enable credential backup using specific flags or configuration options. This behavior explains why network configurations (VLANs, routing) restore successfully while security parameters revert to defaults. Best practice requires documenting credential backup requirements and using secure storage for backup files containing sensitive data.
Unattempted
Switch backup utilities implement security-by-design principles, excluding sensitive credentials (passwords, SNMP strings, certificates) from default backups to prevent unauthorized access if backup files are compromised. Administrators must explicitly enable credential backup using specific flags or configuration options. This behavior explains why network configurations (VLANs, routing) restore successfully while security parameters revert to defaults. Best practice requires documenting credential backup requirements and using secure storage for backup files containing sensitive data.
Question 12 of 60
12. Question
Your AI training cluster experiences performance degradation during multi-node AllReduce operations on large gradient tensors. Which Spectrum-X AI optimization feature should you configure to accelerate collective communication patterns and reduce training iteration time?
Correct
Spectrum-X‘s SHARP technology performs in-network collective operations directly in the switch fabric, aggregating gradient tensors during AllReduce without involving end-host CPUs or GPUs. This collective acceleration reduces network traffic by up to 50% and accelerates training iterations by performing computational aggregation at line rate. SHARP is specifically designed for AI workloads requiring frequent multi-node synchronization, making it the optimal configuration for collective acceleration.
Incorrect
Spectrum-X‘s SHARP technology performs in-network collective operations directly in the switch fabric, aggregating gradient tensors during AllReduce without involving end-host CPUs or GPUs. This collective acceleration reduces network traffic by up to 50% and accelerates training iterations by performing computational aggregation at line rate. SHARP is specifically designed for AI workloads requiring frequent multi-node synchronization, making it the optimal configuration for collective acceleration.
Unattempted
Spectrum-X‘s SHARP technology performs in-network collective operations directly in the switch fabric, aggregating gradient tensors during AllReduce without involving end-host CPUs or GPUs. This collective acceleration reduces network traffic by up to 50% and accelerates training iterations by performing computational aggregation at line rate. SHARP is specifically designed for AI workloads requiring frequent multi-node synchronization, making it the optimal configuration for collective acceleration.
Question 13 of 60
13. Question
A network engineer needs to optimize NVUE CLI operations for managing 50+ Cumulus Linux switches in production. The team reports slow configuration commits and difficulty tracking changes. Which approach BEST optimizes NVUE CLI for this scale?
Correct
Optimizing NVUE CLI at scale requires leveraging atomic commits with pre-validation workflows. Using ‘nv config diff‘ before ‘nv config apply –assume-yes‘ provides change visibility while enabling automation. NVUE‘s transactional architecture batches changes efficiently in memory, and atomic commits ensure consistency across configuration elements. Revision-based tracking allows auditing and rollback without manual state management. Approaches that bypass transactions or add confirmation delays degrade performance and undermine NVUE‘s core optimization model for managing large-scale Cumulus Linux deployments.
Incorrect
Optimizing NVUE CLI at scale requires leveraging atomic commits with pre-validation workflows. Using ‘nv config diff‘ before ‘nv config apply –assume-yes‘ provides change visibility while enabling automation. NVUE‘s transactional architecture batches changes efficiently in memory, and atomic commits ensure consistency across configuration elements. Revision-based tracking allows auditing and rollback without manual state management. Approaches that bypass transactions or add confirmation delays degrade performance and undermine NVUE‘s core optimization model for managing large-scale Cumulus Linux deployments.
Unattempted
Optimizing NVUE CLI at scale requires leveraging atomic commits with pre-validation workflows. Using ‘nv config diff‘ before ‘nv config apply –assume-yes‘ provides change visibility while enabling automation. NVUE‘s transactional architecture batches changes efficiently in memory, and atomic commits ensure consistency across configuration elements. Revision-based tracking allows auditing and rollback without manual state management. Approaches that bypass transactions or add confirmation delays degrade performance and undermine NVUE‘s core optimization model for managing large-scale Cumulus Linux deployments.
Question 14 of 60
14. Question
What is the primary benefit of using GPUDirect RDMA for maximizing throughput in multi-node distributed training workloads?
Correct
GPUDirect RDMA maximizes throughput by enabling direct GPU-to-GPU memory transfers across nodes over InfiniBand or RoCE networks, completely bypassing the CPU and system memory. This eliminates expensive memory copy operations and reduces communication latency, which is critical for multi-node distributed training where frequent gradient synchronization occurs. NCCL leverages GPUDirect RDMA automatically when available.
Incorrect
GPUDirect RDMA maximizes throughput by enabling direct GPU-to-GPU memory transfers across nodes over InfiniBand or RoCE networks, completely bypassing the CPU and system memory. This eliminates expensive memory copy operations and reduces communication latency, which is critical for multi-node distributed training where frequent gradient synchronization occurs. NCCL leverages GPUDirect RDMA automatically when available.
Unattempted
GPUDirect RDMA maximizes throughput by enabling direct GPU-to-GPU memory transfers across nodes over InfiniBand or RoCE networks, completely bypassing the CPU and system memory. This eliminates expensive memory copy operations and reduces communication latency, which is critical for multi-node distributed training where frequent gradient synchronization occurs. NCCL leverages GPUDirect RDMA automatically when available.
Question 15 of 60
15. Question
What is port configuration in the context of ConnectX HCA?
Correct
Port configuration for ConnectX HCA involves setting critical physical layer parameters including link speed (e.g., 100/200/400Gb/s), operating mode (InfiniBand vs Ethernet), and protocol-specific settings. These configurations determine how the adapter interfaces with the fabric infrastructure and must be properly aligned with switch capabilities and fabric topology for optimal RDMA performance and connectivity.
Incorrect
Port configuration for ConnectX HCA involves setting critical physical layer parameters including link speed (e.g., 100/200/400Gb/s), operating mode (InfiniBand vs Ethernet), and protocol-specific settings. These configurations determine how the adapter interfaces with the fabric infrastructure and must be properly aligned with switch capabilities and fabric topology for optimal RDMA performance and connectivity.
Unattempted
Port configuration for ConnectX HCA involves setting critical physical layer parameters including link speed (e.g., 100/200/400Gb/s), operating mode (InfiniBand vs Ethernet), and protocol-specific settings. These configurations determine how the adapter interfaces with the fabric infrastructure and must be properly aligned with switch capabilities and fabric topology for optimal RDMA performance and connectivity.
Question 16 of 60
16. Question
A network engineer is integrating gNMI-based streaming telemetry with NVIDIA GPU fabric monitoring. The system uses gRPC over TLS for real-time telemetry data from NVLink and InfiniBand switches. Which component is CRITICAL for processing high-frequency GPU interconnect metrics without overwhelming the telemetry collector?
Correct
Integrating gNMI streaming telemetry with GPU fabric monitoring requires client-initiated bidirectional gRPC streams using on-change subscriptions with path-specific filtering. This approach minimizes data volume by pushing updates only when metrics change (critical for error counters) while allowing dynamic subscription adjustments. For high-frequency metrics like NVLink bandwidth, targeted paths prevent overwhelming collectors with unnecessary data from thousands of available switch metrics. gRPC‘s native streaming and Protocol Buffers serialization provide efficient transport, but the subscription model and path filtering are the critical components for managing GPU interconnect telemetry at scale.
Incorrect
Integrating gNMI streaming telemetry with GPU fabric monitoring requires client-initiated bidirectional gRPC streams using on-change subscriptions with path-specific filtering. This approach minimizes data volume by pushing updates only when metrics change (critical for error counters) while allowing dynamic subscription adjustments. For high-frequency metrics like NVLink bandwidth, targeted paths prevent overwhelming collectors with unnecessary data from thousands of available switch metrics. gRPC‘s native streaming and Protocol Buffers serialization provide efficient transport, but the subscription model and path filtering are the critical components for managing GPU interconnect telemetry at scale.
Unattempted
Integrating gNMI streaming telemetry with GPU fabric monitoring requires client-initiated bidirectional gRPC streams using on-change subscriptions with path-specific filtering. This approach minimizes data volume by pushing updates only when metrics change (critical for error counters) while allowing dynamic subscription adjustments. For high-frequency metrics like NVLink bandwidth, targeted paths prevent overwhelming collectors with unnecessary data from thousands of available switch metrics. gRPC‘s native streaming and Protocol Buffers serialization provide efficient transport, but the subscription model and path filtering are the critical components for managing GPU interconnect telemetry at scale.
Question 17 of 60
17. Question
Your multi-node H100 training cluster uses NCCL over InfiniBand with GPUDirect RDMA for distributed LLM training. During memory registration for RDMA transfers, you need to minimize registration overhead while ensuring efficient zero-copy GPU-to-GPU communication. Which memory registration approach should you configure?
Correct
For distributed training with GPUDirect RDMA, persistent memory registration is essential. Pin GPU memory buffers at allocation and register them with the InfiniBand adapter for the duration of training. This eliminates registration overhead during frequent NCCL operations, enables efficient zero-copy transfers, and maximizes throughput. NCCL automatically manages persistent registrations when GPUDirect is properly configured with supported IB adapters and CUDA.
Incorrect
For distributed training with GPUDirect RDMA, persistent memory registration is essential. Pin GPU memory buffers at allocation and register them with the InfiniBand adapter for the duration of training. This eliminates registration overhead during frequent NCCL operations, enables efficient zero-copy transfers, and maximizes throughput. NCCL automatically manages persistent registrations when GPUDirect is properly configured with supported IB adapters and CUDA.
Unattempted
For distributed training with GPUDirect RDMA, persistent memory registration is essential. Pin GPU memory buffers at allocation and register them with the InfiniBand adapter for the duration of training. This eliminates registration overhead during frequent NCCL operations, enables efficient zero-copy transfers, and maximizes throughput. NCCL automatically manages persistent registrations when GPUDirect is properly configured with supported IB adapters and CUDA.
Question 18 of 60
18. Question
A data center network architect is deploying EVPN-VXLAN overlay for multi-tenant GPU compute clusters. The design requires automatic discovery of VXLAN tunnel endpoints (VTEPs) and distribution of MAC/IP bindings across spine-leaf fabric. Which BGP EVPN control plane configuration approach achieves this requirement?
Correct
EVPN-VXLAN control plane setup requires MP-BGP with L2VPN EVPN address family configuration. Spine switches act as BGP route reflectors distributing EVPN routes (Type 2 for MAC/IP, Type 3 for VTEP discovery) to leaf VTEP clients. This enables scalable, control-plane based learning without data-plane flooding, essential for GPU cluster fabrics with high east-west traffic.
Incorrect
EVPN-VXLAN control plane setup requires MP-BGP with L2VPN EVPN address family configuration. Spine switches act as BGP route reflectors distributing EVPN routes (Type 2 for MAC/IP, Type 3 for VTEP discovery) to leaf VTEP clients. This enables scalable, control-plane based learning without data-plane flooding, essential for GPU cluster fabrics with high east-west traffic.
Unattempted
EVPN-VXLAN control plane setup requires MP-BGP with L2VPN EVPN address family configuration. Spine switches act as BGP route reflectors distributing EVPN routes (Type 2 for MAC/IP, Type 3 for VTEP discovery) to leaf VTEP clients. This enables scalable, control-plane based learning without data-plane flooding, essential for GPU cluster fabrics with high east-west traffic.
Question 19 of 60
19. Question
What is the primary purpose of configuring SM (Subnet Manager) via UFM in an InfiniBand fabric?
Correct
SM via UFM provides centralized management of InfiniBand subnet manager functions, enabling administrators to configure, monitor, and control subnet topology, routing policies, and fabric operations from UFM‘s unified interface. This centralization simplifies fabric management in large-scale GPU clusters and ensures consistent subnet configuration across the InfiniBand network infrastructure.
Incorrect
SM via UFM provides centralized management of InfiniBand subnet manager functions, enabling administrators to configure, monitor, and control subnet topology, routing policies, and fabric operations from UFM‘s unified interface. This centralization simplifies fabric management in large-scale GPU clusters and ensures consistent subnet configuration across the InfiniBand network infrastructure.
Unattempted
SM via UFM provides centralized management of InfiniBand subnet manager functions, enabling administrators to configure, monitor, and control subnet topology, routing policies, and fabric operations from UFM‘s unified interface. This centralization simplifies fabric management in large-scale GPU clusters and ensures consistent subnet configuration across the InfiniBand network infrastructure.
Question 20 of 60
20. Question
Your AI inference application requires ultra-low latency packet processing for real-time feature extraction from network streams on ConnectX-7 NICs. The application experiences kernel network stack overhead causing latency spikes. How should you configure DPDK integration to achieve optimal data plane acceleration?
Correct
DPDK integration for data plane acceleration requires poll mode drivers with dedicated CPU core pinning to bypass the kernel network stack entirely. This eliminates interrupt handling overhead, context switches, and system calls, achieving deterministic microsecond-level latency. DPDK‘s zero-copy mechanisms and user-space packet processing are essential for real-time AI inference workloads requiring consistent low-latency network I/O performance.
Incorrect
DPDK integration for data plane acceleration requires poll mode drivers with dedicated CPU core pinning to bypass the kernel network stack entirely. This eliminates interrupt handling overhead, context switches, and system calls, achieving deterministic microsecond-level latency. DPDK‘s zero-copy mechanisms and user-space packet processing are essential for real-time AI inference workloads requiring consistent low-latency network I/O performance.
Unattempted
DPDK integration for data plane acceleration requires poll mode drivers with dedicated CPU core pinning to bypass the kernel network stack entirely. This eliminates interrupt handling overhead, context switches, and system calls, achieving deterministic microsecond-level latency. DPDK‘s zero-copy mechanisms and user-space packet processing are essential for real-time AI inference workloads requiring consistent low-latency network I/O performance.
Question 21 of 60
21. Question
A network administrator needs to verify the operational status of InfiniBand ports on an Onyx switch to troubleshoot connectivity issues between GPU compute nodes. Which Onyx CLI command provides comprehensive port state information including physical layer status and link speed?
Correct
Onyx Switch OS provides specialized CLI commands for managing InfiniBand infrastructure in GPU clusters. The ‘show interfaces ib status‘ command is the standard approach for verifying InfiniBand port operational states, displaying link speeds, physical layer status, and error counters. This command is essential for troubleshooting GPUDirect RDMA connectivity issues between compute nodes in AI training clusters using InfiniBand fabrics with NCCL communication.
Incorrect
Onyx Switch OS provides specialized CLI commands for managing InfiniBand infrastructure in GPU clusters. The ‘show interfaces ib status‘ command is the standard approach for verifying InfiniBand port operational states, displaying link speeds, physical layer status, and error counters. This command is essential for troubleshooting GPUDirect RDMA connectivity issues between compute nodes in AI training clusters using InfiniBand fabrics with NCCL communication.
Unattempted
Onyx Switch OS provides specialized CLI commands for managing InfiniBand infrastructure in GPU clusters. The ‘show interfaces ib status‘ command is the standard approach for verifying InfiniBand port operational states, displaying link speeds, physical layer status, and error counters. This command is essential for troubleshooting GPUDirect RDMA connectivity issues between compute nodes in AI training clusters using InfiniBand fabrics with NCCL communication.
Question 22 of 60
22. Question
What is the primary purpose of BlueField-3 DPU‘s InfiniBand capabilities in AI infrastructure?
Correct
BlueField-3 DPUs with InfiniBand provide critical network acceleration for AI infrastructure by offloading network processing from CPUs and enabling GPUDirect RDMA. This allows direct GPU-to-GPU memory transfers across nodes, bypassing the CPU and dramatically reducing latency for distributed training. The DPU handles NCCL collective operations, InfiniBand transport, and congestion management, making it essential for multi-node H100/H200 clusters performing large-scale LLM training.
Incorrect
BlueField-3 DPUs with InfiniBand provide critical network acceleration for AI infrastructure by offloading network processing from CPUs and enabling GPUDirect RDMA. This allows direct GPU-to-GPU memory transfers across nodes, bypassing the CPU and dramatically reducing latency for distributed training. The DPU handles NCCL collective operations, InfiniBand transport, and congestion management, making it essential for multi-node H100/H200 clusters performing large-scale LLM training.
Unattempted
BlueField-3 DPUs with InfiniBand provide critical network acceleration for AI infrastructure by offloading network processing from CPUs and enabling GPUDirect RDMA. This allows direct GPU-to-GPU memory transfers across nodes, bypassing the CPU and dramatically reducing latency for distributed training. The DPU handles NCCL collective operations, InfiniBand transport, and congestion management, making it essential for multi-node H100/H200 clusters performing large-scale LLM training.
Question 23 of 60
23. Question
An InfiniBand fabric administrator needs to implement trending and reporting for UFM historical analysis to track fabric performance degradation over the past 90 days. What is the critical component required to enable effective long-term trending and data correlation across multiple fabric events?
Correct
Effective UFM historical analysis for trending and reporting fundamentally requires a persistent time-series database infrastructure. This component enables long-term data retention (90+ days), efficient indexed queries for correlation analysis, and structured storage of performance metrics. Without persistent telemetry storage, historical trends cannot be established, performance degradation patterns remain undetected, and capacity planning becomes reactive rather than predictive. Time-series databases provide the foundation for visualizing trends, correlating fabric events with performance changes, and generating comprehensive reports that inform proactive fabric management decisions.
Incorrect
Effective UFM historical analysis for trending and reporting fundamentally requires a persistent time-series database infrastructure. This component enables long-term data retention (90+ days), efficient indexed queries for correlation analysis, and structured storage of performance metrics. Without persistent telemetry storage, historical trends cannot be established, performance degradation patterns remain undetected, and capacity planning becomes reactive rather than predictive. Time-series databases provide the foundation for visualizing trends, correlating fabric events with performance changes, and generating comprehensive reports that inform proactive fabric management decisions.
Unattempted
Effective UFM historical analysis for trending and reporting fundamentally requires a persistent time-series database infrastructure. This component enables long-term data retention (90+ days), efficient indexed queries for correlation analysis, and structured storage of performance metrics. Without persistent telemetry storage, historical trends cannot be established, performance degradation patterns remain undetected, and capacity planning becomes reactive rather than predictive. Time-series databases provide the foundation for visualizing trends, correlating fabric events with performance changes, and generating comprehensive reports that inform proactive fabric management decisions.
Question 24 of 60
24. Question
You are configuring a multi-node H100 cluster for distributed LLM training with 128 GPUs across 16 nodes connected via InfiniBand NDR. Which SHARP architecture configuration optimally reduces AllReduce latency for gradient synchronization?
Correct
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) architecture uses Aggregation Nodes deployed on InfiniBand switches to perform gradient reduction operations directly within the network fabric. Tree-based reduction patterns enable hierarchical aggregation across multiple switch tiers, offloading AllReduce computation from GPUs and CPUs. For 128-GPU training, SHARP SANs reduce network traffic and AllReduce latency by performing in-network computing, critical for efficient multi-node distributed training on H100 clusters.
Incorrect
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) architecture uses Aggregation Nodes deployed on InfiniBand switches to perform gradient reduction operations directly within the network fabric. Tree-based reduction patterns enable hierarchical aggregation across multiple switch tiers, offloading AllReduce computation from GPUs and CPUs. For 128-GPU training, SHARP SANs reduce network traffic and AllReduce latency by performing in-network computing, critical for efficient multi-node distributed training on H100 clusters.
Unattempted
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) architecture uses Aggregation Nodes deployed on InfiniBand switches to perform gradient reduction operations directly within the network fabric. Tree-based reduction patterns enable hierarchical aggregation across multiple switch tiers, offloading AllReduce computation from GPUs and CPUs. For 128-GPU training, SHARP SANs reduce network traffic and AllReduce latency by performing in-network computing, critical for efficient multi-node distributed training on H100 clusters.
Question 25 of 60
25. Question
A data science team is training a 70B parameter LLM across 8 H100 GPUs in a DGX system using tensor parallelism. During backpropagation, gradient synchronization across all GPUs is creating a communication bottleneck. Which collective communication operation would most efficiently aggregate and distribute gradients across all GPUs?
Correct
All-reduce is the standard collective operation for distributed training gradient synchronization because it efficiently combines reduction and broadcast in a single operation. NCCL‘s all-reduce implementation uses optimized ring or tree algorithms over NVLink 4.0, achieving near-linear scaling across 8 GPUs. This approach minimizes communication overhead during backpropagation compared to separate operations or manual implementations, making it essential for multi-GPU LLM training.
Incorrect
All-reduce is the standard collective operation for distributed training gradient synchronization because it efficiently combines reduction and broadcast in a single operation. NCCL‘s all-reduce implementation uses optimized ring or tree algorithms over NVLink 4.0, achieving near-linear scaling across 8 GPUs. This approach minimizes communication overhead during backpropagation compared to separate operations or manual implementations, making it essential for multi-GPU LLM training.
Unattempted
All-reduce is the standard collective operation for distributed training gradient synchronization because it efficiently combines reduction and broadcast in a single operation. NCCL‘s all-reduce implementation uses optimized ring or tree algorithms over NVLink 4.0, achieving near-linear scaling across 8 GPUs. This approach minimizes communication overhead during backpropagation compared to separate operations or manual implementations, making it essential for multi-GPU LLM training.
Question 26 of 60
26. Question
What is the primary purpose of distinguishing between bandwidth and throughput when evaluating network performance in AI infrastructure?
Correct
Understanding the difference between bandwidth and throughput is critical for network performance evaluation. Bandwidth represents the theoretical maximum data rate a connection supports, while throughput measures actual data transfer achieved under real conditions with factors like latency, congestion, and overhead. This distinction helps AI practitioners accurately assess whether network infrastructure meets requirements for distributed training or multi-GPU workloads.
Incorrect
Understanding the difference between bandwidth and throughput is critical for network performance evaluation. Bandwidth represents the theoretical maximum data rate a connection supports, while throughput measures actual data transfer achieved under real conditions with factors like latency, congestion, and overhead. This distinction helps AI practitioners accurately assess whether network infrastructure meets requirements for distributed training or multi-GPU workloads.
Unattempted
Understanding the difference between bandwidth and throughput is critical for network performance evaluation. Bandwidth represents the theoretical maximum data rate a connection supports, while throughput measures actual data transfer achieved under real conditions with factors like latency, congestion, and overhead. This distinction helps AI practitioners accurately assess whether network infrastructure meets requirements for distributed training or multi-GPU workloads.
Question 27 of 60
27. Question
What is the primary purpose of Quality of Service (QoS) configuration via the Subnet Manager in an InfiniBand fabric?
Correct
QoS via Subnet Manager is a critical InfiniBand feature that enables traffic prioritization through Service Level to Virtual Lane mappings. This mechanism allows administrators to assign different traffic classes (GPU communication, storage, management) to specific Virtual Lanes with guaranteed bandwidth and latency characteristics, preventing lower-priority traffic from impacting critical workloads in converged high-performance computing environments.
Incorrect
QoS via Subnet Manager is a critical InfiniBand feature that enables traffic prioritization through Service Level to Virtual Lane mappings. This mechanism allows administrators to assign different traffic classes (GPU communication, storage, management) to specific Virtual Lanes with guaranteed bandwidth and latency characteristics, preventing lower-priority traffic from impacting critical workloads in converged high-performance computing environments.
Unattempted
QoS via Subnet Manager is a critical InfiniBand feature that enables traffic prioritization through Service Level to Virtual Lane mappings. This mechanism allows administrators to assign different traffic classes (GPU communication, storage, management) to specific Virtual Lanes with guaranteed bandwidth and latency characteristics, preventing lower-priority traffic from impacting critical workloads in converged high-performance computing environments.
Question 28 of 60
28. Question
What is the primary purpose of the BGP best path selection algorithm in data center networks?
Correct
The BGP best path selection algorithm systematically evaluates route attributes (LOCAL_PREF, AS_PATH, ORIGIN, MED, IGP cost) to select the single optimal path for each destination prefix. This deterministic process ensures consistent routing decisions across the network, enabling predictable traffic forwarding in data center fabrics with multiple redundant paths.
Incorrect
The BGP best path selection algorithm systematically evaluates route attributes (LOCAL_PREF, AS_PATH, ORIGIN, MED, IGP cost) to select the single optimal path for each destination prefix. This deterministic process ensures consistent routing decisions across the network, enabling predictable traffic forwarding in data center fabrics with multiple redundant paths.
Unattempted
The BGP best path selection algorithm systematically evaluates route attributes (LOCAL_PREF, AS_PATH, ORIGIN, MED, IGP cost) to select the single optimal path for each destination prefix. This deterministic process ensures consistent routing decisions across the network, enabling predictable traffic forwarding in data center fabrics with multiple redundant paths.
Question 29 of 60
29. Question
A 128-node GPU cluster uses InfiniBand with multiple Subnet Managers for redundancy. After a network topology change, you observe unequal path utilization causing 30% throughput degradation in NCCL all-reduce operations. Which integration approach between Subnet Manager routing and path computation would resolve this most effectively?
Correct
The Subnet Manager‘s path computation algorithm must integrate with adaptive routing mechanisms to resolve unequal path utilization. Adaptive Routing enables the SM to compute and install multiple valid paths in switch forwarding tables, allowing hardware to dynamically select optimal paths based on real-time port congestion. This integration between centralized path computation (SM) and distributed path selection (switch hardware) is critical for GPU clusters where NCCL all-reduce patterns generate predictable but heavy traffic loads. Static routing approaches lack the runtime adaptability needed for dynamic workloads.
Incorrect
The Subnet Manager‘s path computation algorithm must integrate with adaptive routing mechanisms to resolve unequal path utilization. Adaptive Routing enables the SM to compute and install multiple valid paths in switch forwarding tables, allowing hardware to dynamically select optimal paths based on real-time port congestion. This integration between centralized path computation (SM) and distributed path selection (switch hardware) is critical for GPU clusters where NCCL all-reduce patterns generate predictable but heavy traffic loads. Static routing approaches lack the runtime adaptability needed for dynamic workloads.
Unattempted
The Subnet Manager‘s path computation algorithm must integrate with adaptive routing mechanisms to resolve unequal path utilization. Adaptive Routing enables the SM to compute and install multiple valid paths in switch forwarding tables, allowing hardware to dynamically select optimal paths based on real-time port congestion. This integration between centralized path computation (SM) and distributed path selection (switch hardware) is critical for GPU clusters where NCCL all-reduce patterns generate predictable but heavy traffic loads. Static routing approaches lack the runtime adaptability needed for dynamic workloads.
Question 30 of 60
30. Question
Your team is deploying a multi-node H100 training cluster using RoCE v2 over 100 GbE Ethernet. During initial testing, you observe significant performance degradation and packet loss during distributed training workloads with high all-reduce traffic. When would you implement Lossless Ethernet to ensure zero packet loss?
Correct
Lossless Ethernet with Priority Flow Control (PFC) is essential for RoCE deployments to prevent packet drops during network congestion. RoCE relies on RDMA semantics that expect zero packet loss; any drops trigger expensive end-to-end retransmissions that devastate training performance. PFC pauses traffic at the link layer when receiver buffers approach capacity, preventing drops during bursty all-reduce operations common in distributed training. This is critical for multi-GPU clusters where NCCL generates high-volume collective communication patterns.
Incorrect
Lossless Ethernet with Priority Flow Control (PFC) is essential for RoCE deployments to prevent packet drops during network congestion. RoCE relies on RDMA semantics that expect zero packet loss; any drops trigger expensive end-to-end retransmissions that devastate training performance. PFC pauses traffic at the link layer when receiver buffers approach capacity, preventing drops during bursty all-reduce operations common in distributed training. This is critical for multi-GPU clusters where NCCL generates high-volume collective communication patterns.
Unattempted
Lossless Ethernet with Priority Flow Control (PFC) is essential for RoCE deployments to prevent packet drops during network congestion. RoCE relies on RDMA semantics that expect zero packet loss; any drops trigger expensive end-to-end retransmissions that devastate training performance. PFC pauses traffic at the link layer when receiver buffers approach capacity, preventing drops during bursty all-reduce operations common in distributed training. This is critical for multi-GPU clusters where NCCL generates high-volume collective communication patterns.
Question 31 of 60
31. Question
A NetQ deployment shows packet drops in WJH analysis with reason code “L2_MTU_ERROR“ on spine switches, but application teams report no connectivity issues. Traffic analysis reveals fragmented packets arriving at leaf switches. What is the most likely cause of this WJH drop pattern?
Correct
WJH with NetQ provides integrated drop analysis by correlating hardware-level packet drop telemetry with fabric configuration state. L2_MTU_ERROR indicates packets exceeded configured MTU during L2 forwarding on spine switches. The pattern of fragmented packets at leafs with no application impact suggests MTU inconsistency: leafs accept larger frames, forward to spines with smaller MTU, causing drops. However, TCP MSS clamping causes sources to fragment, allowing application function despite infrastructure misconfiguration. NetQ‘s topology-aware WJH analysis identifies these cross-tier configuration discrepancies.
Incorrect
WJH with NetQ provides integrated drop analysis by correlating hardware-level packet drop telemetry with fabric configuration state. L2_MTU_ERROR indicates packets exceeded configured MTU during L2 forwarding on spine switches. The pattern of fragmented packets at leafs with no application impact suggests MTU inconsistency: leafs accept larger frames, forward to spines with smaller MTU, causing drops. However, TCP MSS clamping causes sources to fragment, allowing application function despite infrastructure misconfiguration. NetQ‘s topology-aware WJH analysis identifies these cross-tier configuration discrepancies.
Unattempted
WJH with NetQ provides integrated drop analysis by correlating hardware-level packet drop telemetry with fabric configuration state. L2_MTU_ERROR indicates packets exceeded configured MTU during L2 forwarding on spine switches. The pattern of fragmented packets at leafs with no application impact suggests MTU inconsistency: leafs accept larger frames, forward to spines with smaller MTU, causing drops. However, TCP MSS clamping causes sources to fragment, allowing application function despite infrastructure misconfiguration. NetQ‘s topology-aware WJH analysis identifies these cross-tier configuration discrepancies.
Question 32 of 60
32. Question
A network administrator needs to configure multiple VLANs on a Cumulus Linux switch for tenant isolation in a data center. The administrator must ensure layer 2 traffic separation across VLANs 100, 200, and 300 on interfaces swp1-swp10. Which configuration approach correctly implements VLAN-aware bridging for this scenario?
Correct
VLAN-aware bridging is the modern, scalable approach for multi-VLAN layer 2 configuration in Cumulus Linux. It uses a single bridge instance to manage multiple VLANs efficiently through the bridge.vids parameter, providing proper traffic isolation while minimizing resource consumption. This configuration aligns with Cumulus best practices for data center deployments requiring tenant isolation across multiple VLANs.
Incorrect
VLAN-aware bridging is the modern, scalable approach for multi-VLAN layer 2 configuration in Cumulus Linux. It uses a single bridge instance to manage multiple VLANs efficiently through the bridge.vids parameter, providing proper traffic isolation while minimizing resource consumption. This configuration aligns with Cumulus best practices for data center deployments requiring tenant isolation across multiple VLANs.
Unattempted
VLAN-aware bridging is the modern, scalable approach for multi-VLAN layer 2 configuration in Cumulus Linux. It uses a single bridge instance to manage multiple VLANs efficiently through the bridge.vids parameter, providing proper traffic isolation while minimizing resource consumption. This configuration aligns with Cumulus best practices for data center deployments requiring tenant isolation across multiple VLANs.
Question 33 of 60
33. Question
A fabric manager deployment requires zero downtime for InfiniBand network monitoring across 512 H100 GPUs in a training cluster. The infrastructure team configured two UFM appliances with shared storage for topology data. What is the critical missing component for achieving true high availability?
Correct
UFM high availability requires a Virtual IP address with automatic failover as the critical component. While shared storage provides state synchronization between UFM instances, VIP ensures zero-downtime failover by allowing the secondary UFM to assume the primary‘s IP address automatically. This eliminates the need for managed switch reconfiguration or client reconnection during failover. NCCL, NVLink, and GPUDirect operate at different layers (GPU communication, intra-node interconnect, RDMA data plane) and don‘t participate in UFM‘s control plane redundancy architecture.
Incorrect
UFM high availability requires a Virtual IP address with automatic failover as the critical component. While shared storage provides state synchronization between UFM instances, VIP ensures zero-downtime failover by allowing the secondary UFM to assume the primary‘s IP address automatically. This eliminates the need for managed switch reconfiguration or client reconnection during failover. NCCL, NVLink, and GPUDirect operate at different layers (GPU communication, intra-node interconnect, RDMA data plane) and don‘t participate in UFM‘s control plane redundancy architecture.
Unattempted
UFM high availability requires a Virtual IP address with automatic failover as the critical component. While shared storage provides state synchronization between UFM instances, VIP ensures zero-downtime failover by allowing the secondary UFM to assume the primary‘s IP address automatically. This eliminates the need for managed switch reconfiguration or client reconnection during failover. NCCL, NVLink, and GPUDirect operate at different layers (GPU communication, intra-node interconnect, RDMA data plane) and don‘t participate in UFM‘s control plane redundancy architecture.
Question 34 of 60
34. Question
A data center is deploying a multi-rail InfiniBand fabric with NVIDIA Quantum-2 switches to support GPU clusters running distributed AI training. The network architect needs to configure adaptive routing and credit-based flow control across the switch fabric. When would switch fabric configuration for port and routing setup be most critical?
Correct
Switch fabric configuration for port and routing setup is most critical during initial fabric deployment, when establishing routing tables, adaptive routing policies, and buffer credit allocation. These configurations directly impact NCCL collective communication performance for distributed GPU training. Proper fabric setup ensures optimal load balancing, minimizes congestion, and maximizes throughput for multi-node AI workloads across the InfiniBand network.
Incorrect
Switch fabric configuration for port and routing setup is most critical during initial fabric deployment, when establishing routing tables, adaptive routing policies, and buffer credit allocation. These configurations directly impact NCCL collective communication performance for distributed GPU training. Proper fabric setup ensures optimal load balancing, minimizes congestion, and maximizes throughput for multi-node AI workloads across the InfiniBand network.
Unattempted
Switch fabric configuration for port and routing setup is most critical during initial fabric deployment, when establishing routing tables, adaptive routing policies, and buffer credit allocation. These configurations directly impact NCCL collective communication performance for distributed GPU training. Proper fabric setup ensures optimal load balancing, minimizes congestion, and maximizes throughput for multi-node AI workloads across the InfiniBand network.
Question 35 of 60
35. Question
What is the primary purpose of database scalability in UFM (Unified Fabric Manager) architecture when supporting large-scale InfiniBand fabrics?
Correct
Database scalability in UFM architecture is essential for supporting large InfiniBand fabrics with thousands of devices. It ensures the management platform can efficiently store topology data, telemetry metrics, and historical information without performance degradation. As fabric sizes grow in modern AI and HPC clusters with hundreds of switches and adapters, scalable database architecture prevents UFM from becoming a management bottleneck, maintaining responsive monitoring and configuration capabilities.
Incorrect
Database scalability in UFM architecture is essential for supporting large InfiniBand fabrics with thousands of devices. It ensures the management platform can efficiently store topology data, telemetry metrics, and historical information without performance degradation. As fabric sizes grow in modern AI and HPC clusters with hundreds of switches and adapters, scalable database architecture prevents UFM from becoming a management bottleneck, maintaining responsive monitoring and configuration capabilities.
Unattempted
Database scalability in UFM architecture is essential for supporting large InfiniBand fabrics with thousands of devices. It ensures the management platform can efficiently store topology data, telemetry metrics, and historical information without performance degradation. As fabric sizes grow in modern AI and HPC clusters with hundreds of switches and adapters, scalable database architecture prevents UFM from becoming a management bottleneck, maintaining responsive monitoring and configuration capabilities.
Question 36 of 60
36. Question
A NetQ validation check reports BGP EVPN session failures across multiple spine switches in your NVIDIA Spectrum fabric. The BGP sessions between leaf-spine appear established, but EVPN NLRI routes are not being exchanged. What is the most likely cause of this validation failure?
Correct
BGP EVPN requires explicit address family configuration beyond base BGP session establishment. The ‘address-family l2vpn evpn‘ must be configured with neighbor activation on both peers for EVPN NLRI exchange. NetQ protocol validation checks verify not just BGP session state but proper EVPN capability negotiation and route advertisement. Common misconfigurations include enabling BGP without activating the l2vpn evpn address family, causing the exact symptom described where base sessions are established but EVPN routes aren‘t exchanged.
Incorrect
BGP EVPN requires explicit address family configuration beyond base BGP session establishment. The ‘address-family l2vpn evpn‘ must be configured with neighbor activation on both peers for EVPN NLRI exchange. NetQ protocol validation checks verify not just BGP session state but proper EVPN capability negotiation and route advertisement. Common misconfigurations include enabling BGP without activating the l2vpn evpn address family, causing the exact symptom described where base sessions are established but EVPN routes aren‘t exchanged.
Unattempted
BGP EVPN requires explicit address family configuration beyond base BGP session establishment. The ‘address-family l2vpn evpn‘ must be configured with neighbor activation on both peers for EVPN NLRI exchange. NetQ protocol validation checks verify not just BGP session state but proper EVPN capability negotiation and route advertisement. Common misconfigurations include enabling BGP without activating the l2vpn evpn address family, causing the exact symptom described where base sessions are established but EVPN routes aren‘t exchanged.
Question 37 of 60
37. Question
What distinguishes InfiniBand‘s physical and data link layers (Layer 1-2) from traditional Ethernet in high-performance computing environments?
Correct
InfiniBand‘s Layer 1-2 architecture is purpose-built for high-performance computing, featuring credit-based flow control for lossless transmission and native RDMA support at the physical layer. This differs fundamentally from Ethernet‘s best-effort delivery model, providing ultra-low latency essential for multi-GPU distributed training and GPUDirect RDMA operations in NVIDIA AI infrastructure.
Incorrect
InfiniBand‘s Layer 1-2 architecture is purpose-built for high-performance computing, featuring credit-based flow control for lossless transmission and native RDMA support at the physical layer. This differs fundamentally from Ethernet‘s best-effort delivery model, providing ultra-low latency essential for multi-GPU distributed training and GPUDirect RDMA operations in NVIDIA AI infrastructure.
Unattempted
InfiniBand‘s Layer 1-2 architecture is purpose-built for high-performance computing, featuring credit-based flow control for lossless transmission and native RDMA support at the physical layer. This differs fundamentally from Ethernet‘s best-effort delivery model, providing ultra-low latency essential for multi-GPU distributed training and GPUDirect RDMA operations in NVIDIA AI infrastructure.
Question 38 of 60
38. Question
A team is training a 175B parameter LLM across 64 H100 GPUs using NeMo Framework. The model is too large for a single GPU‘s memory. Which parallelism strategy combination in NeMo enables splitting model layers across GPUs while also partitioning individual layer tensors across multiple devices?
Correct
NeMo Framework leverages Megatron-LM‘s 3D parallelism for ultra-large models. Tensor parallelism splits weight matrices within layers across GPUs (intra-layer), while pipeline parallelism distributes sequential layers across devices (inter-layer). Combined with data parallelism, this enables training 175B+ parameter models. NCCL provides optimized collectives (all-reduce, all-gather) over NVLink 4.0 (900 GB/s) for efficient tensor synchronization within nodes and GPUDirect RDMA for cross-node communication.
Incorrect
NeMo Framework leverages Megatron-LM‘s 3D parallelism for ultra-large models. Tensor parallelism splits weight matrices within layers across GPUs (intra-layer), while pipeline parallelism distributes sequential layers across devices (inter-layer). Combined with data parallelism, this enables training 175B+ parameter models. NCCL provides optimized collectives (all-reduce, all-gather) over NVLink 4.0 (900 GB/s) for efficient tensor synchronization within nodes and GPUDirect RDMA for cross-node communication.
Unattempted
NeMo Framework leverages Megatron-LM‘s 3D parallelism for ultra-large models. Tensor parallelism splits weight matrices within layers across GPUs (intra-layer), while pipeline parallelism distributes sequential layers across devices (inter-layer). Combined with data parallelism, this enables training 175B+ parameter models. NCCL provides optimized collectives (all-reduce, all-gather) over NVLink 4.0 (900 GB/s) for efficient tensor synchronization within nodes and GPUDirect RDMA for cross-node communication.
Question 39 of 60
39. Question
A multi-node H100 cluster is being configured with NVIDIA Quantum-2 switches to support large-scale LLM training workloads requiring maximum bandwidth between nodes. The infrastructure team needs to deploy InfiniBand connectivity supporting 400G and 800G speeds. What is the critical component that enables these NDR (400G) and XDR (800G) InfiniBand speeds in Quantum-2 switches?
Correct
NDR (400G) and XDR (800G) InfiniBand speeds in Quantum-2 switches are enabled by SerDes technology operating at 100 Gbps per lane. NDR uses 4 lanes (4x100G = 400G) and XDR uses 8 lanes (8x100G = 800G). This high-speed serialization/deserialization at the physical layer is critical for supporting the extreme bandwidth requirements of multi-node GPU clusters. Combined with GPUDirect RDMA and NCCL, these speeds enable efficient distributed training for large language models across H100/H200 clusters.
Incorrect
NDR (400G) and XDR (800G) InfiniBand speeds in Quantum-2 switches are enabled by SerDes technology operating at 100 Gbps per lane. NDR uses 4 lanes (4x100G = 400G) and XDR uses 8 lanes (8x100G = 800G). This high-speed serialization/deserialization at the physical layer is critical for supporting the extreme bandwidth requirements of multi-node GPU clusters. Combined with GPUDirect RDMA and NCCL, these speeds enable efficient distributed training for large language models across H100/H200 clusters.
Unattempted
NDR (400G) and XDR (800G) InfiniBand speeds in Quantum-2 switches are enabled by SerDes technology operating at 100 Gbps per lane. NDR uses 4 lanes (4x100G = 400G) and XDR uses 8 lanes (8x100G = 800G). This high-speed serialization/deserialization at the physical layer is critical for supporting the extreme bandwidth requirements of multi-node GPU clusters. Combined with GPUDirect RDMA and NCCL, these speeds enable efficient distributed training for large language models across H100/H200 clusters.
Question 40 of 60
40. Question
Your data center infrastructure uses Cumulus Linux switches for network fabric alongside NVIDIA GPU clusters. You need to monitor network performance, track link utilization, and correlate network issues with GPU training job performance. When would you use NetQ integration with Cumulus Linux?
Correct
NetQ integration with Cumulus Linux provides specialized network telemetry, fabric-wide visibility, and historical network state validation for data center environments. It excels at monitoring Cumulus Linux-based Ethernet fabrics, tracking network performance, and validating configurations—critical for infrastructures supporting GPU training clusters. NetQ complements GPU monitoring tools by providing network-layer insights but doesn‘t replace configuration management tools, InfiniBand fabric managers, or GPU resource monitors.
Incorrect
NetQ integration with Cumulus Linux provides specialized network telemetry, fabric-wide visibility, and historical network state validation for data center environments. It excels at monitoring Cumulus Linux-based Ethernet fabrics, tracking network performance, and validating configurations—critical for infrastructures supporting GPU training clusters. NetQ complements GPU monitoring tools by providing network-layer insights but doesn‘t replace configuration management tools, InfiniBand fabric managers, or GPU resource monitors.
Unattempted
NetQ integration with Cumulus Linux provides specialized network telemetry, fabric-wide visibility, and historical network state validation for data center environments. It excels at monitoring Cumulus Linux-based Ethernet fabrics, tracking network performance, and validating configurations—critical for infrastructures supporting GPU training clusters. NetQ complements GPU monitoring tools by providing network-layer insights but doesn‘t replace configuration management tools, InfiniBand fabric managers, or GPU resource monitors.
Question 41 of 60
41. Question
A network engineer needs to configure VLAN-aware bridges on a Cumulus Linux switch using a declarative approach that validates configurations before applying them. Which CLI tool should be used to ensure atomicity and rollback capabilities?
Correct
NVUE is Cumulus Linux‘s modern CLI tool designed for declarative, transactional configuration management. Unlike vtysh (imperative, routing-focused) or direct file editing, NVUE validates configurations before applying, supports atomic commits, and enables easy rollback. For VLAN-aware bridge configuration requiring safety and validation, NVUE‘s ‘nv set‘ commands with ‘nv config apply‘ provide the necessary declarative approach and operational reliability.
Incorrect
NVUE is Cumulus Linux‘s modern CLI tool designed for declarative, transactional configuration management. Unlike vtysh (imperative, routing-focused) or direct file editing, NVUE validates configurations before applying, supports atomic commits, and enables easy rollback. For VLAN-aware bridge configuration requiring safety and validation, NVUE‘s ‘nv set‘ commands with ‘nv config apply‘ provide the necessary declarative approach and operational reliability.
Unattempted
NVUE is Cumulus Linux‘s modern CLI tool designed for declarative, transactional configuration management. Unlike vtysh (imperative, routing-focused) or direct file editing, NVUE validates configurations before applying, supports atomic commits, and enables easy rollback. For VLAN-aware bridge configuration requiring safety and validation, NVUE‘s ‘nv set‘ commands with ‘nv config apply‘ provide the necessary declarative approach and operational reliability.
Question 42 of 60
42. Question
What is Telemetry collection in the context of UFM Monitoring?
Correct
Telemetry collection in UFM Monitoring involves continuous, automated streaming of InfiniBand fabric performance data including port statistics, bandwidth utilization, error counters, and switch metrics. This real-time data flow enables immediate network visibility, allowing administrators to detect anomalies, optimize performance, and troubleshoot issues proactively rather than relying on periodic reports or manual log collection.
Incorrect
Telemetry collection in UFM Monitoring involves continuous, automated streaming of InfiniBand fabric performance data including port statistics, bandwidth utilization, error counters, and switch metrics. This real-time data flow enables immediate network visibility, allowing administrators to detect anomalies, optimize performance, and troubleshoot issues proactively rather than relying on periodic reports or manual log collection.
Unattempted
Telemetry collection in UFM Monitoring involves continuous, automated streaming of InfiniBand fabric performance data including port statistics, bandwidth utilization, error counters, and switch metrics. This real-time data flow enables immediate network visibility, allowing administrators to detect anomalies, optimize performance, and troubleshoot issues proactively rather than relying on periodic reports or manual log collection.
Question 43 of 60
43. Question
A network engineer is configuring BGP on Cumulus Linux switches using FRRouting for a leaf-spine architecture. After entering BGP neighbor configurations, the sessions fail to establish. What is the critical component missing from the FRRouting configuration workflow?
Correct
Cumulus Linux with FRRouting requires explicit configuration activation, distinguishing it from traditional network operating systems. When using NCLU (Network Command Line Utility), the ‘net commit‘ command applies pending configurations to FRR daemons. When configuring directly via vtysh, changes must be activated through ‘net commit‘ (if NCLU-managed) or FRR service restart. Without this activation step, BGP neighbor configurations remain uncommitted in the pending state, preventing session establishment. This workflow ensures atomic configuration changes and rollback capability, critical for production fabric deployments.
Incorrect
Cumulus Linux with FRRouting requires explicit configuration activation, distinguishing it from traditional network operating systems. When using NCLU (Network Command Line Utility), the ‘net commit‘ command applies pending configurations to FRR daemons. When configuring directly via vtysh, changes must be activated through ‘net commit‘ (if NCLU-managed) or FRR service restart. Without this activation step, BGP neighbor configurations remain uncommitted in the pending state, preventing session establishment. This workflow ensures atomic configuration changes and rollback capability, critical for production fabric deployments.
Unattempted
Cumulus Linux with FRRouting requires explicit configuration activation, distinguishing it from traditional network operating systems. When using NCLU (Network Command Line Utility), the ‘net commit‘ command applies pending configurations to FRR daemons. When configuring directly via vtysh, changes must be activated through ‘net commit‘ (if NCLU-managed) or FRR service restart. Without this activation step, BGP neighbor configurations remain uncommitted in the pending state, preventing session establishment. This workflow ensures atomic configuration changes and rollback capability, critical for production fabric deployments.
Question 44 of 60
44. Question
What is in-network reduction in the context of SHARP (Scalable Hierarchical Aggregation and Reduction Protocol)?
Correct
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) enables in-network reduction by performing collective operations like all-reduce directly within InfiniBand switches. This hardware-accelerated approach offloads communication overhead from GPUs and CPUs, significantly reducing latency and improving bandwidth utilization during multi-GPU distributed training. SHARP is critical for scaling training workloads across large clusters with minimal communication bottlenecks.
Incorrect
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) enables in-network reduction by performing collective operations like all-reduce directly within InfiniBand switches. This hardware-accelerated approach offloads communication overhead from GPUs and CPUs, significantly reducing latency and improving bandwidth utilization during multi-GPU distributed training. SHARP is critical for scaling training workloads across large clusters with minimal communication bottlenecks.
Unattempted
SHARP (Scalable Hierarchical Aggregation and Reduction Protocol) enables in-network reduction by performing collective operations like all-reduce directly within InfiniBand switches. This hardware-accelerated approach offloads communication overhead from GPUs and CPUs, significantly reducing latency and improving bandwidth utilization during multi-GPU distributed training. SHARP is critical for scaling training workloads across large clusters with minimal communication bottlenecks.
Question 45 of 60
45. Question
A data center team managing a 64-node H100 cluster running distributed LLM training workloads needs to monitor NCCL collective operations across InfiniBand fabric to identify communication bottlenecks. Which technology should they deploy to capture real-time metrics on AllReduce and AllGather performance patterns?
Correct
UFM Telemetry with NCCL integration is the purpose-built solution for monitoring collective operations in NVIDIA GPU clusters over InfiniBand. It provides fabric-wide visibility into NCCL communication patterns, enabling teams to identify bottlenecks like congestion, imbalanced traffic, or suboptimal routing. While DCGM monitors GPU health and Nsight Systems profiles single-node performance, only UFM delivers the network-level telemetry required for distributed training optimization at scale.
Incorrect
UFM Telemetry with NCCL integration is the purpose-built solution for monitoring collective operations in NVIDIA GPU clusters over InfiniBand. It provides fabric-wide visibility into NCCL communication patterns, enabling teams to identify bottlenecks like congestion, imbalanced traffic, or suboptimal routing. While DCGM monitors GPU health and Nsight Systems profiles single-node performance, only UFM delivers the network-level telemetry required for distributed training optimization at scale.
Unattempted
UFM Telemetry with NCCL integration is the purpose-built solution for monitoring collective operations in NVIDIA GPU clusters over InfiniBand. It provides fabric-wide visibility into NCCL communication patterns, enabling teams to identify bottlenecks like congestion, imbalanced traffic, or suboptimal routing. While DCGM monitors GPU health and Nsight Systems profiles single-node performance, only UFM delivers the network-level telemetry required for distributed training optimization at scale.
Question 46 of 60
46. Question
A multi-node H100 training cluster experiences intermittent NCCL timeout errors during AllReduce operations. Initial network diagnostics show InfiniBand adapters are physically connected. When would using the ibstatus command be the MOST appropriate next troubleshooting step?
Correct
The ibstatus command is specifically designed to quickly verify InfiniBand port link states and operational status. In this scenario, physical connectivity is confirmed but NCCL operations fail, suggesting ports may not have completed link initialization to ACTIVE state. ibstatus immediately reveals if ports are DOWN, INIT, or ARMED rather than ACTIVE, pinpointing link negotiation failures. This makes it the correct first diagnostic step before investigating higher-layer issues like subnet management or RDMA configuration.
Incorrect
The ibstatus command is specifically designed to quickly verify InfiniBand port link states and operational status. In this scenario, physical connectivity is confirmed but NCCL operations fail, suggesting ports may not have completed link initialization to ACTIVE state. ibstatus immediately reveals if ports are DOWN, INIT, or ARMED rather than ACTIVE, pinpointing link negotiation failures. This makes it the correct first diagnostic step before investigating higher-layer issues like subnet management or RDMA configuration.
Unattempted
The ibstatus command is specifically designed to quickly verify InfiniBand port link states and operational status. In this scenario, physical connectivity is confirmed but NCCL operations fail, suggesting ports may not have completed link initialization to ACTIVE state. ibstatus immediately reveals if ports are DOWN, INIT, or ARMED rather than ACTIVE, pinpointing link negotiation failures. This makes it the correct first diagnostic step before investigating higher-layer issues like subnet management or RDMA configuration.
Question 47 of 60
47. Question
A network engineer is planning capacity for an AI training cluster with 16x H100 GPUs across 4 nodes connected via 400GbE RoCE network. Each node will perform all-reduce operations every 50ms during distributed training. When would calculating wire speed and line rate be most critical for validating this design?
Correct
Wire speed and line rate calculations are essential for validating network capacity under sustained load. In this multi-node training scenario, the 400GbE network must maintain maximum theoretical throughput (50GB/s line rate) during continuous NCCL all-reduce operations. Calculating wire speed ensures switches and NICs can forward packets at line rate without buffering or drops, critical for preventing network bottlenecks in GPU synchronization.
Incorrect
Wire speed and line rate calculations are essential for validating network capacity under sustained load. In this multi-node training scenario, the 400GbE network must maintain maximum theoretical throughput (50GB/s line rate) during continuous NCCL all-reduce operations. Calculating wire speed ensures switches and NICs can forward packets at line rate without buffering or drops, critical for preventing network bottlenecks in GPU synchronization.
Unattempted
Wire speed and line rate calculations are essential for validating network capacity under sustained load. In this multi-node training scenario, the 400GbE network must maintain maximum theoretical throughput (50GB/s line rate) during continuous NCCL all-reduce operations. Calculating wire speed ensures switches and NICs can forward packets at line rate without buffering or drops, critical for preventing network bottlenecks in GPU synchronization.
Question 48 of 60
48. Question
A network administrator is deploying UFM Cyber-AI to monitor a high-performance computing cluster with 500 InfiniBand switches. Which technology should be configured first to establish accurate baselines for normal behavior learning before anomaly detection can begin?
Correct
UFM Cyber-AI‘s baseline establishment relies on machine learning algorithms that observe network telemetry over an initial learning period (typically 7-14 days) to understand normal behavior patterns. This ML-based approach dynamically adapts to legitimate traffic variations, workload changes, and infrastructure updates, creating accurate behavioral models essential for subsequent anomaly detection. Static thresholds or manual methods cannot capture the complexity of modern HPC network behavior.
Incorrect
UFM Cyber-AI‘s baseline establishment relies on machine learning algorithms that observe network telemetry over an initial learning period (typically 7-14 days) to understand normal behavior patterns. This ML-based approach dynamically adapts to legitimate traffic variations, workload changes, and infrastructure updates, creating accurate behavioral models essential for subsequent anomaly detection. Static thresholds or manual methods cannot capture the complexity of modern HPC network behavior.
Unattempted
UFM Cyber-AI‘s baseline establishment relies on machine learning algorithms that observe network telemetry over an initial learning period (typically 7-14 days) to understand normal behavior patterns. This ML-based approach dynamically adapts to legitimate traffic variations, workload changes, and infrastructure updates, creating accurate behavioral models essential for subsequent anomaly detection. Static thresholds or manual methods cannot capture the complexity of modern HPC network behavior.
Question 49 of 60
49. Question
What is link flap in the context of Ethernet troubleshooting?
Correct
Link flap is the rapid and repeated oscillation of a network interface between operational (up) and non-operational (down) states. This instability causes network disruptions, triggers excessive logging of state change events, and prevents reliable connectivity. Common causes include faulty cables, transceiver issues, power fluctuations, or incompatible speed/duplex settings requiring systematic diagnosis.
Incorrect
Link flap is the rapid and repeated oscillation of a network interface between operational (up) and non-operational (down) states. This instability causes network disruptions, triggers excessive logging of state change events, and prevents reliable connectivity. Common causes include faulty cables, transceiver issues, power fluctuations, or incompatible speed/duplex settings requiring systematic diagnosis.
Unattempted
Link flap is the rapid and repeated oscillation of a network interface between operational (up) and non-operational (down) states. This instability causes network disruptions, triggers excessive logging of state change events, and prevents reliable connectivity. Common causes include faulty cables, transceiver issues, power fluctuations, or incompatible speed/duplex settings requiring systematic diagnosis.
Question 50 of 60
50. Question
A network administrator needs to configure UFM Cyber-AI‘s anomaly detection for ML-based threat detection across a multi-tenant InfiniBand fabric. Which configuration approach ensures the system learns normal traffic patterns while minimizing false positives during the initial training period?
Correct
UFM Cyber-AI‘s ML-based anomaly detection requires an initial baseline learning period (typically 7-14 days) where the system observes normal traffic patterns across the InfiniBand fabric without generating alerts. This unsupervised learning approach establishes statistical models of legitimate behavior, capturing workload variations, tenant-specific patterns, and temporal cycles. After baseline establishment, the system activates threat detection by identifying deviations from learned norms, significantly reducing false positives while detecting both known and zero-day threats through behavioral analysis.
Incorrect
UFM Cyber-AI‘s ML-based anomaly detection requires an initial baseline learning period (typically 7-14 days) where the system observes normal traffic patterns across the InfiniBand fabric without generating alerts. This unsupervised learning approach establishes statistical models of legitimate behavior, capturing workload variations, tenant-specific patterns, and temporal cycles. After baseline establishment, the system activates threat detection by identifying deviations from learned norms, significantly reducing false positives while detecting both known and zero-day threats through behavioral analysis.
Unattempted
UFM Cyber-AI‘s ML-based anomaly detection requires an initial baseline learning period (typically 7-14 days) where the system observes normal traffic patterns across the InfiniBand fabric without generating alerts. This unsupervised learning approach establishes statistical models of legitimate behavior, capturing workload variations, tenant-specific patterns, and temporal cycles. After baseline establishment, the system activates threat detection by identifying deviations from learned norms, significantly reducing false positives while detecting both known and zero-day threats through behavioral analysis.
Question 51 of 60
51. Question
What is Partition management in the context of an InfiniBand Subnet Manager?
Correct
Partition management in InfiniBand Subnet Manager involves assigning Partition Keys (PKeys) to fabric nodes, creating logical network segments within a single physical fabric. Each partition isolates traffic, ensuring nodes communicate only with others sharing the same PKey. This enables multi-tenancy, security boundaries, and workload isolation without requiring separate physical infrastructure.
Incorrect
Partition management in InfiniBand Subnet Manager involves assigning Partition Keys (PKeys) to fabric nodes, creating logical network segments within a single physical fabric. Each partition isolates traffic, ensuring nodes communicate only with others sharing the same PKey. This enables multi-tenancy, security boundaries, and workload isolation without requiring separate physical infrastructure.
Unattempted
Partition management in InfiniBand Subnet Manager involves assigning Partition Keys (PKeys) to fabric nodes, creating logical network segments within a single physical fabric. Each partition isolates traffic, ensuring nodes communicate only with others sharing the same PKey. This enables multi-tenancy, security boundaries, and workload isolation without requiring separate physical infrastructure.
Question 52 of 60
52. Question
Your datacenter runs high-frequency trading applications requiring ultra-low latency network processing on NVIDIA BlueField-3 DPUs. Network protocol overhead is creating CPU bottlenecks during peak trading hours. When would you use Ethernet offload for network function offloading to address this performance issue?
Correct
Ethernet offload on BlueField DPUs addresses network protocol processing bottlenecks by moving TCP/IP stack operations, checksum calculations, and packet segmentation/reassembly from host CPU to DPU ARM cores. This reduces host CPU utilization by 40-60% and network latency by 30-50%, critical for latency-sensitive applications like high-frequency trading. BlueField-3 provides hardware acceleration for these network functions while maintaining full Ethernet protocol compatibility.
Incorrect
Ethernet offload on BlueField DPUs addresses network protocol processing bottlenecks by moving TCP/IP stack operations, checksum calculations, and packet segmentation/reassembly from host CPU to DPU ARM cores. This reduces host CPU utilization by 40-60% and network latency by 30-50%, critical for latency-sensitive applications like high-frequency trading. BlueField-3 provides hardware acceleration for these network functions while maintaining full Ethernet protocol compatibility.
Unattempted
Ethernet offload on BlueField DPUs addresses network protocol processing bottlenecks by moving TCP/IP stack operations, checksum calculations, and packet segmentation/reassembly from host CPU to DPU ARM cores. This reduces host CPU utilization by 40-60% and network latency by 30-50%, critical for latency-sensitive applications like high-frequency trading. BlueField-3 provides hardware acceleration for these network functions while maintaining full Ethernet protocol compatibility.
Question 53 of 60
53. Question
What is the primary purpose of the ConnectX-7 Ethernet adapter in AI infrastructure deployments?
Correct
ConnectX-7 Ethernet adapters deliver 400G network bandwidth specifically designed for modern AI infrastructure. They enable high-throughput distributed training across multiple nodes, fast storage access, and efficient GPU cluster communication over Ethernet fabrics. This represents NVIDIA‘s current-generation solution for network-intensive AI workloads requiring extreme bandwidth beyond 100G or 200G adapters.
Incorrect
ConnectX-7 Ethernet adapters deliver 400G network bandwidth specifically designed for modern AI infrastructure. They enable high-throughput distributed training across multiple nodes, fast storage access, and efficient GPU cluster communication over Ethernet fabrics. This represents NVIDIA‘s current-generation solution for network-intensive AI workloads requiring extreme bandwidth beyond 100G or 200G adapters.
Unattempted
ConnectX-7 Ethernet adapters deliver 400G network bandwidth specifically designed for modern AI infrastructure. They enable high-throughput distributed training across multiple nodes, fast storage access, and efficient GPU cluster communication over Ethernet fabrics. This represents NVIDIA‘s current-generation solution for network-intensive AI workloads requiring extreme bandwidth beyond 100G or 200G adapters.
Question 54 of 60
54. Question
What is the primary purpose of BGP route reflectors in data center network architectures?
Correct
BGP route reflectors solve the scalability challenge in large data center fabrics by eliminating the full mesh iBGP requirement. Instead of each router peering with every other router (N²-N)/2 sessions, routers peer only with route reflectors, reducing complexity to linear scaling. This architectural pattern is essential for modern leaf-spine topologies running eBGP to the fabric and iBGP within redundancy groups.
Incorrect
BGP route reflectors solve the scalability challenge in large data center fabrics by eliminating the full mesh iBGP requirement. Instead of each router peering with every other router (N²-N)/2 sessions, routers peer only with route reflectors, reducing complexity to linear scaling. This architectural pattern is essential for modern leaf-spine topologies running eBGP to the fabric and iBGP within redundancy groups.
Unattempted
BGP route reflectors solve the scalability challenge in large data center fabrics by eliminating the full mesh iBGP requirement. Instead of each router peering with every other router (N²-N)/2 sessions, routers peer only with route reflectors, reducing complexity to linear scaling. This architectural pattern is essential for modern leaf-spine topologies running eBGP to the fabric and iBGP within redundancy groups.
Question 55 of 60
55. Question
A multi-node H100 training cluster experiences performance degradation during large-scale distributed training due to specific network links becoming saturated. In Adaptive Routing scenarios with InfiniBand fabric, which approach most effectively achieves avoiding hotspots during NCCL collective operations?
Correct
Adaptive Routing is the primary congestion avoidance mechanism in InfiniBand fabrics, dynamically selecting optimal paths based on real-time network conditions. By monitoring queue depths and link utilization at fabric switches, it distributes traffic across multiple available paths, preventing any single link from becoming a bottleneck. This is critical for NCCL collective operations in multi-node GPU training where hotspots can severely impact all-reduce performance and overall training throughput.
Incorrect
Adaptive Routing is the primary congestion avoidance mechanism in InfiniBand fabrics, dynamically selecting optimal paths based on real-time network conditions. By monitoring queue depths and link utilization at fabric switches, it distributes traffic across multiple available paths, preventing any single link from becoming a bottleneck. This is critical for NCCL collective operations in multi-node GPU training where hotspots can severely impact all-reduce performance and overall training throughput.
Unattempted
Adaptive Routing is the primary congestion avoidance mechanism in InfiniBand fabrics, dynamically selecting optimal paths based on real-time network conditions. By monitoring queue depths and link utilization at fabric switches, it distributes traffic across multiple available paths, preventing any single link from becoming a bottleneck. This is critical for NCCL collective operations in multi-node GPU training where hotspots can severely impact all-reduce performance and overall training throughput.
Question 56 of 60
56. Question
A data center team needs to extend Layer 2 connectivity across multiple physical rack locations for tenant workloads while maintaining network isolation and scalability. The infrastructure spans three buildings connected via Layer 3 IP networks. When would you use VXLAN overlay for network virtualization in this scenario?
Correct
VXLAN overlay network virtualization is ideal when extending Layer 2 connectivity across Layer 3 routed infrastructure for multi-tenant environments. It solves VLAN scalability limits (4096 VLANs vs 16M VNIs), enables VM mobility across locations, and provides network abstraction from physical topology. This scenario requires Layer 2 extension across three buildings connected via IP networks, making VXLAN the appropriate solution for tenant isolation and seamless connectivity.
Incorrect
VXLAN overlay network virtualization is ideal when extending Layer 2 connectivity across Layer 3 routed infrastructure for multi-tenant environments. It solves VLAN scalability limits (4096 VLANs vs 16M VNIs), enables VM mobility across locations, and provides network abstraction from physical topology. This scenario requires Layer 2 extension across three buildings connected via IP networks, making VXLAN the appropriate solution for tenant isolation and seamless connectivity.
Unattempted
VXLAN overlay network virtualization is ideal when extending Layer 2 connectivity across Layer 3 routed infrastructure for multi-tenant environments. It solves VLAN scalability limits (4096 VLANs vs 16M VNIs), enables VM mobility across locations, and provides network abstraction from physical topology. This scenario requires Layer 2 extension across three buildings connected via IP networks, making VXLAN the appropriate solution for tenant isolation and seamless connectivity.
Question 57 of 60
57. Question
What is the primary purpose of UFM (Unified Fabric Manager) deployment in NVIDIA InfiniBand network environments?
Correct
UFM (Unified Fabric Manager) is NVIDIA‘s centralized management platform for InfiniBand networks in AI and HPC environments. Its primary deployment purpose is to provide comprehensive fabric visibility, performance monitoring, topology management, and congestion detection. UFM enables administrators to optimize network routing, troubleshoot connectivity issues, and ensure efficient multi-node communication essential for large-scale distributed training workloads across GPU clusters.
Incorrect
UFM (Unified Fabric Manager) is NVIDIA‘s centralized management platform for InfiniBand networks in AI and HPC environments. Its primary deployment purpose is to provide comprehensive fabric visibility, performance monitoring, topology management, and congestion detection. UFM enables administrators to optimize network routing, troubleshoot connectivity issues, and ensure efficient multi-node communication essential for large-scale distributed training workloads across GPU clusters.
Unattempted
UFM (Unified Fabric Manager) is NVIDIA‘s centralized management platform for InfiniBand networks in AI and HPC environments. Its primary deployment purpose is to provide comprehensive fabric visibility, performance monitoring, topology management, and congestion detection. UFM enables administrators to optimize network routing, troubleshoot connectivity issues, and ensure efficient multi-node communication essential for large-scale distributed training workloads across GPU clusters.
Question 58 of 60
58. Question
A network administrator is configuring a new InfiniBand fabric with 128 compute nodes. During the subnet initialization, the administrator needs to ensure proper node identification for routing decisions. Which addressing approach correctly maps nodes for packet forwarding within the subnet?
Correct
InfiniBand uses a two-tier addressing system: 64-bit GUIDs for permanent hardware identification and 16-bit LIDs for subnet routing. The Subnet Manager discovers nodes via GUIDs, then assigns LIDs dynamically to each port. Switch forwarding tables use LIDs for packet routing decisions within the subnet. This separation allows flexible routing configuration while maintaining unique device identification across network reconfigurations.
Incorrect
InfiniBand uses a two-tier addressing system: 64-bit GUIDs for permanent hardware identification and 16-bit LIDs for subnet routing. The Subnet Manager discovers nodes via GUIDs, then assigns LIDs dynamically to each port. Switch forwarding tables use LIDs for packet routing decisions within the subnet. This separation allows flexible routing configuration while maintaining unique device identification across network reconfigurations.
Unattempted
InfiniBand uses a two-tier addressing system: 64-bit GUIDs for permanent hardware identification and 16-bit LIDs for subnet routing. The Subnet Manager discovers nodes via GUIDs, then assigns LIDs dynamically to each port. Switch forwarding tables use LIDs for packet routing decisions within the subnet. This separation allows flexible routing configuration while maintaining unique device identification across network reconfigurations.
Question 59 of 60
59. Question
A research team is training a 175B parameter LLM across 32 H100 GPUs distributed over 4 DGX nodes connected via InfiniBand. During training, they observe that gradient synchronization is creating significant bottlenecks. When would east-west traffic patterns be most critical for optimizing GPU-to-GPU communication in this distributed training scenario?
Correct
East-west traffic patterns are critical during distributed training‘s collective communication operations, specifically all-reduce for gradient synchronization. In multi-node training, NCCL orchestrates GPU-to-GPU gradient exchanges both within nodes (via NVLink at 900 GB/s) and across nodes (via InfiniBand with GPUDirect RDMA). This horizontal, peer-to-peer communication dominates network bandwidth in large-scale training, distinguishing it from vertical north-south patterns like data loading or checkpointing.
Incorrect
East-west traffic patterns are critical during distributed training‘s collective communication operations, specifically all-reduce for gradient synchronization. In multi-node training, NCCL orchestrates GPU-to-GPU gradient exchanges both within nodes (via NVLink at 900 GB/s) and across nodes (via InfiniBand with GPUDirect RDMA). This horizontal, peer-to-peer communication dominates network bandwidth in large-scale training, distinguishing it from vertical north-south patterns like data loading or checkpointing.
Unattempted
East-west traffic patterns are critical during distributed training‘s collective communication operations, specifically all-reduce for gradient synchronization. In multi-node training, NCCL orchestrates GPU-to-GPU gradient exchanges both within nodes (via NVLink at 900 GB/s) and across nodes (via InfiniBand with GPUDirect RDMA). This horizontal, peer-to-peer communication dominates network bandwidth in large-scale training, distinguishing it from vertical north-south patterns like data loading or checkpointing.
Question 60 of 60
60. Question
An AI training cluster requires 200G HDR InfiniBand connectivity between 128 compute nodes with H100 GPUs. The network architect plans to deploy QM8700 switches as the leaf layer. Which configuration approach ensures optimal port density and minimal latency for GPUDirect RDMA workloads?
Correct
QM8700 architecture optimally supports HDR 200G with 64-port configuration, providing high density and full bandwidth for H100 clusters. Direct leaf attachments minimize latency for GPUDirect RDMA traffic essential for NCCL all-reduce operations. Dual-rail connectivity enhances bandwidth aggregation and fault tolerance. Alternative configurations either reduce bandwidth (100G mode), introduce compatibility mismatches (EDR), or reference non-existent products, compromising cluster performance for AI training workloads.
Incorrect
QM8700 architecture optimally supports HDR 200G with 64-port configuration, providing high density and full bandwidth for H100 clusters. Direct leaf attachments minimize latency for GPUDirect RDMA traffic essential for NCCL all-reduce operations. Dual-rail connectivity enhances bandwidth aggregation and fault tolerance. Alternative configurations either reduce bandwidth (100G mode), introduce compatibility mismatches (EDR), or reference non-existent products, compromising cluster performance for AI training workloads.
Unattempted
QM8700 architecture optimally supports HDR 200G with 64-port configuration, providing high density and full bandwidth for H100 clusters. Direct leaf attachments minimize latency for GPUDirect RDMA traffic essential for NCCL all-reduce operations. Dual-rail connectivity enhances bandwidth aggregation and fault tolerance. Alternative configurations either reduce bandwidth (100G mode), introduce compatibility mismatches (EDR), or reference non-existent products, compromising cluster performance for AI training workloads.
X
Use Page numbers below to navigate to other practice tests