This paper examines the impact of Whisper, an open-source AI automated speech recognition (ASR) software, on captioning work at Emory Libraries and the staff who conduct the work. Captions are essential for searchability, discoverability, and user accessibility; however, providing captions has typically been challenging due to the resources required. Whisper shows potential for enabling proactive captioning of digitized content but also raises questions about the impact of ASR-generated captions on the staff engaged in this and similar work.
With data collected from a grant-funded captioning project and an ongoing oral history program, this paper examines this impact and observes that while Whisper reduces the most labor-intensive phase of captioning, this does not necessarily entail a reduction in staff time, due to changing demand and the ongoing need for human review in providing accurate, high-quality captions and transcripts. Whisper’s overall impact may be on transforming human labor rather than replacing it