Vision-Language foundation models