Sofia Broomé

Preprints | Publications | About

About

I have (as of Sept 2 2022) a Ph.D. in machine learning from KTH Royal Institute of Technology in Stockholm, at the Robotics, Perception & Learning (RPL) division. My advisor has been Prof. Hedvig Kjellström. I was also co-advised by Prof. Pia Haubro Andersen. My thesis, with title Learning Spatiotemporal Features in Low-Data and Fine-Grained Action Recognition with an Application to Equine Pain Behavior can be found here.

My research project has revolved around trying to detect horses' pain expressions using computer vision. I've worked mostly with video data and have found that the temporal unfolding of these videos is decisive for reliable pain diagnostics; hence I'm interested in spatiotemporal features and action recognition as well.

Why recognize pain in horses?

Learning to detect pain in horses automatically is important because it is difficult for humans, even for veterinarian experts in equine pain. Horses are prey animals and tend to hide their pain, not wanting to show when they are vulnerable. Currently, many horses in Sweden are euthanized prematurely due to disorders which are not really lethal, such as problems with joints, because we detect the conditions too late for treatment.


Background
Before I started my Ph.D., I was in the engineering physics program at KTH, with my MSc in machine learning. My BSc thesis from 2014 was on the use of partial differential equations in climate modeling. Even longer ago I lived for two years in Paris, waitressed, and read David Foster Wallace books that made me want to not not study maths.

Reviewing
I was a reviewer for ECCV2020, CVPR2021, ICCV2021, WACV2022, ICLR2022, CVPR2022, NeurIPS 2022 and WACV2023. Assistant reviewer for RSS2020.

CV (last updated: August 2022)

Google scholar page
Twitter
Github

E-mail: broome.sofia [at] gmail.com


News

July 16 2023 With all the recent progress for disease modifying treatments in Alzheimer's disease, I'm excited to be presenting an abstract with colleagues at CTAD in Boston in October later this year with title Evaluation of machine learning models that predict Alzheimer's disease progression in observational studies and randomized clinical trials. Find the preliminary program here.

June 18 2023 Our paper Predictive Modeling of Equine Activity Budgets Using a 3D Skeleton Reconstructed from Surveillance Recordings was presented at the workshop CV4Animals at CVPR 2023 in Vancouver. Check out the work on arXiv!

March 23 2023 Our Temporal Shape dataset can now be found at CVF's Computer vision exchange (COVE) website for public datasets. I recommend the site as it is overall a great listing for the CV community of what public datasets exist out there.

Nov 25 2022 Happy to announce that our survey paper Going Deeper than Tracking: A Survey of Computer-Vision Based Recognition of Animal Pain and Emotions is now published in IJCV as part of their special issue on Computer vision approaches to animal tracking and modeling.

Oct 17 2022 I'm moving to Paris! To join Therapanacea as a Machine Learning R&D Engineer, working with computer vision in medical imaging, which I'm very excited about! 🎉

Oct 11 2022 Our "Recur, attend or convolve?"-paper was accepted to WACV 2023! ☀️

Sept 28 2022 The day before my defense I gave a talk on horses, pain and temporal information at the RISE Learning Machines seminars, now available on YouTube.

Sept 2 2022 I graduated! 🎩 At the defense of my thesis were great discussions with Efstratios Gavves as opponent, as well as with the grading committee members Guoying Zhao, Ingela Nyström and Kalle Åström. Thank you all very much for being part of the defense. The thesis can be found here.

May 31 2022 I will give a lecture in two parts for the Ph.D. course "Digital tools and objective methods for motion research in animals" at the Swedish University of Agriculture (SLU) in Uppsala on June 10th, on machine learning and computer vision, and on my research within automated horse pain recognition.

May 25 2022 Will have a second poster at CV4Animals! Just found out that our IJCV special issue submission was accepted to the workshop. The article is a survey on computer vision methods for the recognition of pain and affective states in (non-human) animals 🐷

May 22 2022 Our recent PLOS article on pain domain transfer was accepted to the CV4Animals workshop @CVPR22 in June.

April 29 2022 Happy to have been selected as one of 150 among 630 applicants for the International Computer Vision Summer School (ICVSS) 2022 in Sicily, looking forward!

April 27 2022 Defending my thesis entitled 'Learning Spatiotemporal Features in Low-Data and Fine-Grained Action Recognition with an Application to Equine Pain Behavior' on September 2nd (with opponent Efstratios Gavves.)

Publications

Going Deeper than Tracking: A Survey of Computer-Vision Based Recognition of Animal Pain and Emotions

Abstract: Advances in animal motion tracking and pose recognition have been a game changer in the study of animal behavior. Recently, an increasing number of works go ‘deeper’ than tracking, and address automated recognition of animals’ internal states such as emotions and pain with the aim of improving animal welfare, making this a timely moment for a systematization of the field. This paper provides a comprehensive survey of computer vision-based research on recognition of pain and emotional states in animals, addressing both facial and bodily behavior analysis. We summarize the efforts that have been presented so far within this topic—classifying them across different dimensions, highlight challenges and research gaps, and provide best practice recommendations for advancing the field, and some future directions for research.



Sofia Broomé, Marcelo Feighelstein, Anna Zamansky, Gabriel Carreira Lencioni, Pia Haubro Andersen, Francisca Pessanha, Marwa Mahmoud, Hedvig Kjellström, Albert Ali Salah
IJCV 2022.
pdf

Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition

Abstract: Most action recognition models today are highly parameterized, and evaluated on datasets with appearance-wise distinct classes. It has also been shown that 2D Convolutional Neural Networks (CNNs) tend to be biased toward texture rather than shape in still image recognition tasks, in contrast to humans. Taken together, this raises suspicion that large video models partly learn spurious spatial texture correlations rather than to track relevant shapes over time to infer generalizable semantics from their movement. A natural way to avoid parameter explosion when learning visual patterns over time is to make use of recurrence. Biological vision consists of abundant recurrent circuitry, and is superior to computer vision in terms of domain shift generalization. In this article, we empirically study whether the choice of low-level temporal modeling has consequences for texture bias and cross-domain robustness. In order to enable a light-weight and systematic assessment of the ability to capture temporal structure, not revealed from single frames, we provide the Temporal Shape (TS) dataset, as well as modified domains of Diving48 allowing for the investigation of spatial texture bias in video models. The combined results of our experiments indicate that sound physical inductive bias such as recurrence in temporal modeling may be advantageous when robustness to domain shift is important for the task.




Sofia Broomé, Ernest Pokropek, Boyu Li, Hedvig Kjellström
WACV 2023, to appear.
pdf | code


Sharing Pain: Using Pain Domain Transfer for Video Recognition of Low Grade Orthopedic Pain in Horse


Abstract: Orthopedic disorders are common among horses, often leading to euthanasia, which often could have been avoided with earlier detection. These conditions often create varying degrees of subtle long-term pain. It is challenging to train a visual pain recognition method with video data depicting such pain, since the resulting pain behavior also is subtle, sparsely appearing, and varying, making it challenging for even an expert human labeller to provide accurate ground-truth for the data. We show that a model trained solely on a dataset of horses with acute experimental pain (where labeling is less ambiguous) can aid recognition of the more subtle displays of orthopedic pain. Moreover, we present a human expert baseline for the problem, as well as an extensive empirical study of various domain transfer methods and of what is detected by the pain recognition method trained on clean experimental pain in the orthopedic dataset. Finally, this is accompanied with a discussion around the challenges posed by real-world animal behavior datasets and how best practices can be established for similar fine-grained action recognition tasks.





Sofia Broomé, Katrina Ask, Maheen Rashid, Pia Haubro Andersen, Hedvig Kjellström
PLOS ONE 2022.
pdf | code

Equine Pain Behavior Classification via Self-Supervised Disentangled Pose Representation


Abstract: Timely detection of horse pain is important for equine welfare. Horses express pain through their facial and body behavior, but may hide signs of pain from unfamiliar human observers. In addition, collecting visual data with detailed annotation of horse behavior and pain state is both cumbersome and not scalable. Consequently, a pragmatic equine pain classification system would use video of the unobserved horse and weak labels. This paper proposes such a method for equine pain classification by using multi-view surveillance video footage of unobserved horses with induced orthopaedic pain, with temporally sparse video level pain labels. To ensure that pain is learned from horse body language alone, we first train a self-supervised generative model to disentangle horse pose from its appearance and background before using the disentangled horse pose latent representation for pain classification. To make best use of the pain labels, we develop a novel loss that formulates pain classification as a multi-instance learning problem. Our method achieves pain classification accuracy better than human expert performance with 60% accuracy. The learned latent horse pose representation is shown to be viewpoint covariant, and disentangled from horse appearance. Qualitative analysis of pain classified segments shows correspondence between the pain symptoms identified by our model, and equine pain scales used in veterinary practice.



Maheen Rashid, Sofia Broomé, Katrina Ask, Elin Hernlund, Pia Haubro Andersen, Hedvig Kjellström, Yong Jae Lee,
WACV 2022.
pdf | code


hSMAL: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition


Abstract: In this paper we present our preliminary work on model-based behavioral analysis of horse motion. Our approach is based on the SMAL model, a 3D articulated statistical model of animal shape. We define a novel SMAL model for horses based on a new template, skeleton and shape space learned from 37 horse toys. We test the accuracy of our hSMAL model in reconstructing a horse from 3D mocap data and images. We apply the hSMAL model to the problem of lameness detection from video, where we fit the model to images to recover 3D pose and train an ST-GCN network on pose data. A comparison with the same network trained on mocap points illustrates the benefit of our approach.



Ci Li, Nima Ghorbani, Sofia Broomé, Maheen Rashid, Michael J. Black, Elin Hernlund, Hedvig Kjellström, Silvia Zuffi
CV4Animals, CVPR Workshop 2021.
pdf


Interpreting video features: a comparison of 3D convolutional networks and convolutional LSTM networks

Abstract: A number of techniques for interpretability have been presented for deep learning in computer vision, typically with the goal of understanding what the networks have based their classification on. However, interpretability for deep video architectures is still in its infancy and we do not yet have a clear concept of how to decode spatiotemporal features. In this paper, we present a study comparing how 3D convolutional networks and convolutional LSTM networks learn features across temporally dependent frames. This is the first comparison of two video models that both convolve to learn spatial features but have principally different methods of modeling time. Additionally, we extend the concept of meaningful perturbation introduced by Fong & Vedaldi (ICCV 2017) to the temporal dimension, to identify the temporal part of a sequence most meaningful to the network for a classification decision. Our findings indicate that the 3D convolutional model concentrates on shorter events in the input sequence, and places its spatial focus on fewer, contiguous areas.



Sofia Broomé, Xiaoyu Lu, Joonatan Mänttäri, John Folkesson, Hedvig Kjellström
ACCV 2020.
pdf | project | code


Dynamics are Important for the Recognition of Equine Pain in Video


Abstract: A prerequisite to successfully alleviate pain in animals is to recognize it, which is a great challenge in non-verbal species. Furthermore, prey animals such as horses tend to hide their pain. In this study, we propose a deep recurrent two-stream architecture for the task of distinguishing pain from non-pain in videos of horses. Different models are evaluated on a unique dataset showing horses under controlled trials with moderate pain induction, which has been presented in earlier work. Sequential models are experimentally compared to single-frame models, showing the importance of the temporal dimension of the data, and are benchmarked against a veterinary expert classification of the data. We additionally perform baseline comparisons with generalized versions of state-of-the-art human pain recognition methods. While equine pain detection in machine learning is a novel field, our results surpass veterinary expert performance and outperform pain detection results reported for other larger non-human species.





Sofia Broomé, Karina Bech Gleerup, Pia Haubro Andersen, Hedvig Kjellström
CVPR 2019.
pdf | code

Preprints



Automated Detection of Equine Facial Action Units


Abstract: The recently developed Equine Facial Action Coding System (EquiFACS) provides a precise and exhaustive, but laborious, manual labelling method of facial action units of the horse. To automate parts of this process, we propose a Deep Learning-based method to detect EquiFACS units automatically from images. We use a cascade framework; we firstly train several object detectors to detect the predefined Region-of-Interest (ROI), and secondly apply binary classifiers for each action unit in related regions. We experiment with both regular CNNs and a more tailored model transferred from human facial action unit recognition. Promising initial results are presented for nine action units in the eye and lower face regions. Code for the project is publicly available.



Zhenghong Li, Sofia Broomé, Pia Haubro Andersen, Hedvig Kjellström
arXiv 2021
pdf | code