Object detection is a complex visual function that has applications in many safety-critical domains such as autonomous driving or medical diagnosis. In this paper we examine how its behavior can be explained. More precisely, we discuss and analyze the specificity of object detection with respect to explainability, and describe an approach for explaining object location prediction from a popular and efficient attentional-based deep neural network architecture: DETR.