Skip to main content

VIViD Seminars

Welcome to VIViD seminars. To sign up for the seminar mailing list please email [email protected]

If you are interested to become a speaker, please contact: [email protected]

Here are the seminars that have already been done:

Towards Explainable Abnormal Infant Movements Identification for the Early Prediction of Cerebral Palsy

Providing an early diagnosis of cerebral palsy (CP) is key to enhancing the developmental outcomes for those affected. Diagnostic tools such as the General Movements Assessment (GMA), have produced promising results in early prediction, however, these manual methods can be laborious. In this talk, I will introduce the projects we have been working on closely with our clinical partners since 2018. We focused on using the pose-based features extracted from skeletal motions for abnormal infant movement detection using traditional machine learning as well as deep learning approaches. We further enhanced the interpretability of the proposed models with intuitive visualization.

Edmond Shu-lim Ho is currently an Associate Professor and the Programme Leader for BSc (Hons) Computer Science in the Department of Computer and Information Sciences at Northumbria University, Newcastle, UK. Prior to joining Northumbria University in 2016 as a Senior Lecturer, he was a Research Assistant Professor in the Department of Computer Science at Hong Kong Baptist University. He received his PhD degree from the University of Edinburgh. His research interests include Computer Graphics, Computer Vision, Motion Analysis, and Machine Learning.

The Universal Vulnerability of Human Action Recognition (HAR) Classifiers and Potential Solutions

He Wang
23th Mar 2022

Deep learning has been regarded as the `go to’ solution for many tasks today, but its intrinsic vulnerability to malicious attacks has become a major concern. The vulnerability is affected by a variety of factors including models, tasks, data, and attackers.  In this talk, we investigate skeleton-based Human Activity Recognition (HAR), which is an important type of time-series data widely used for self-driving cars, security and safety, etc.  Very recently, we have identified a universal vulnerability in existing HAR classifiers. This is through proposing the first adversarial attack approaches on such tasks. Further, we also investigate how to enhance the robustness and resilience of existing classifiers, across different data, tasks, classifiers and attackers.

He Wang is an Associate Professor in the Visualisation and Computer Graphics group, at the School of Computing, University of Leeds, UK.  He is also a Turing Fellow, an Academic Advisor at the Commonwealth Scholarship Council, the Director of High-Performance Graphics and Game Engineering, and an academic lead of Centre for Immersive Technology at Leeds. His current research interest is mainly in computer graphics, vision and machine learning and applications. Previously he was a Senior Research Associate at Disney Research Los Angeles.  He received his PhD and did a post-doc in the School of Informatics, University of Edinburgh. 

Deep Learning for Healthcare

Xianghua Xie
16th March 2022

In this talk, I would like to discuss some of our recent attempts in developing predictive models for analysing electronic health records and understanding anatomical structures from medical images. In the first part of the talk, I will present two studies of using electronic health records to predict dementia patient hospitalisation risks and the onset of sepsis in an ICU environment. Two different types of neural network ensemble are used, but both aim to provide some degrees of interpretability. For example in the dementia study, the GP records of each patient were selected one year before diagnosis up to hospital admission. 52.5 million individual records of 59,298 patients were used. 30,178 were admitted to hospital and 29,120 remained with GP care. From the 54,649 initial event codes, the ten most important signals identified for admission were two diagnostic events (nightmares, essential hypertension), five medication events (betahistine dihydrochloride, ibuprofen gel, simvastatin, influenza vaccine, calcium carbonate and colecalciferol chewable tablets), and three procedural events (third party encounter, social group 3, blood glucose raised). They performed significantly above conventional methods. In the second part of the talk, I would like to present our work on graph deep learning and how this can be used to perform segmentation on volumetric medical scans.  I will present a graph-based convolutional neural network, which simultaneously learns spatially related local and global features on a graph representation from multi-resolution volumetric data. The Graph-CNN models are then used for the purpose of efficient marginal space learning. Unlike conventional convolutional neural network operators, the graph-based CNN operators allow spatially related features to be learned on the non-Cartesian domain of the multi-resolution space. Some challenges in graph deep learning will be briefly discussed as well.

Xianghua Xie is a Professor at the Department of Computer Science, Swansea University. His research covers various aspects of computer vision and pattern recognition. He started his lecturing career as an RCUK academic fellow, and has been an investigator on several projects funded by EPSRC, Leverhulme, NISCHR, and WORD. He has been working in the areas of Pattern Recognition and Machine Intelligence and their applications to real world problems since his PhD work at Bristol University. His recent work includes detecting abnormal patterns in complex visual and medical data, assisted diagnosis using automated image analysis, fully automated volumetric image segmentation, registration, and motion analysis, machine understanding of human action, efficient deep learning, and deep learning on irregular domains. He has published over 170 research papers and (co-)edited several conference proceedings. 

Video Understanding – An Egocentric Perspective

Dima Damen
23rd Feb 2022

This talk aims to argue for a fine(r)-grained perspective onto human-object interactions, from video sequences, captured in an egocentric perspective (i.e. first-person footage). Using multi-modal input, I will present approaches for determining skill or expertise from video sequences [CVPR 2019], few-shot learning [CVPR2021], dual-domain [CVPR 2020] as well as multi-modal fusion using vision, audio and language [CVPR 2021, CVPR 2020, ICCV 2019, ICASSP 2021]. These approaches centre around the problems of recognition and cross-modal retrieval. All project details at: 
I will also introduce the latest on EPIC-KITCHENS-100, the largest egocentric dataset in people’s homes. [] and the ongoing collaboration Ego4D []

Dima Damen is a Professor of Computer Vision at the University of Bristol. Dima is currently an EPSRC Fellow (2020-2025), focusing her research interests in the automatic understanding of object interactions, actions and activities using wearable visual (and depth) sensors. She has contributed to novel research questions including assessing action completion, skill/expertise determination from video sequences, discovering task-relevant objects, dual-domain and dual-time learning as well as multi-modal fusion using vision, audio and language. She is the project lead for EPIC-KITCHENS, the largest dataset in egocentric vision, with accompanying open challenges. She also leads the EPIC annual workshop series alongside major conferences (CVPR/ICCV/ECCV). Dima is a program chair for ICCV 2021, associate editor of IJCV, IEEE TPAMI and Pattern Recognition. She was selected as a Nokia Research collaborator in 2016, and as an Outstanding Reviewer in CVPR2021, CVPR2020, ICCV2017, CVPR2013 and CVPR2012. Dima received her PhD from the University of Leeds (2009), joined the University of Bristol as a Postdoctoral Researcher (2010-2012), Assistant Professor (2013-2018), Associate Professor (2018-2021) and was appointed as chair in August 2021. She supervises 9 PhD students, and 5 postdoctoral researchers.

Vision != Photo

Yi-Zhe Song
9th Feb 2022

While the vision community is accustomed to reasoning with photos, one does need to be reminded that photos are mere raw pixels with no semantics. Recent research has recognised this very fact and started to delve into human sketches instead — a form of visual data that had been inherently subjected to human semantic interpretation. This shift has already started to cause profound impact on many facets of research on computer vision, computer graphics, machine learning, and artificial intelligence at large. Sketch has not only been used as novel means for applications such as cross-modal image retrieval, 3D modelling, forensics, but also as key enablers for the fundamental understanding of visual abstraction and creativity which were otherwise infeasible with photos. This talk will summarise some of these trends, mainly using examples from research performed at SketchX. We will start with conventional sketch topics such as recognition, synthesis, to the more recent exciting developments on abstraction modelling and human creativity. We will then talk about how sketch research has redefined some of the more conventional vision topics such as (i) fine-grained visual analysis, (ii) 3D vision (AR/VR), and (iii) OCR. We will finish by highlighting a few open research challenges to drive future sketch research.

Yi-Zhe Song is a Professor of Computer Vision and Machine Learning, and Director of SketchX Lab at the Centre for Vision Speech and Signal Processing (CVSSP), University of Surrey. He obtained a PhD in 2008 on Computer Vision and Machine Learning from the University of Bath, a MSc (with Best Dissertation Award) in 2004 from the University of Cambridge, and a Bachelor’s degree (First Class Honours) in 2003 from the University of Bath. He is an Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), and Frontiers in Computer Science – Computer Vision. He served as a Program Chair for the British Machine Vision Conference (BMVC) 2021, and regularly serves as Area Chair (AC) for flagship computer vision and machine learning conferences, most recently at CVPR’22 and ICCV’21. He is a Senior Member of IEEE, a Fellow of the Higher Education Academy, as well as full member of the EPSRC review college.

Meta-Learning for Computer Vision and Beyond

Timothy Hosepedales
2th Feb 2022

In this talk I will first give an introduction to the meta-learning field, as motivated by our recent survey paper on meta-learning in neural networks. I hope that this will be informative for newcomers, as well as reveal some interesting connections and contrasts that will be thought-provoking for experts. I will then give a brief overview of recent meta-learning applications which shows how it can be applied to benefit broad issues in computer vision and beyond and including dealing with domain-shift, data augmentation, learning with label noise, improving generalisation, accelerating reinforcement learning, and enabling Neuro-Symbolic learning. 

Timothy Hospedales is a Professor within IPAB in the School of Informatics at the University of Edinburgh, where he heads the Machine Intelligence Research group. He is also the Principal Scientist at Samsung AI Research Centre, Cambridge and a Turing Fellow of the Alan Turing Institute. His research focuses on data-efficient and robust machine learning using techniques such as meta-learning and lifelong transfer-learning, in both probabilistic and deep learning contexts. He works in a variety of application areas including computer vision, vision and language, reinforcement learning for robot control, finance and beyond.

Known Operator Learning – An Approach to Unite Machine Learning, Signal Processing and Physics

Andreas Maier
27th Oct 2021

We describe an approach for incorporating prior knowledge into machine learning algorithms. We aim at applications in physics and signal processing in which we know that certain operations must be embedded into the algorithm. Any operation that allows computation of a gradient or sub-gradient towards its inputs is suited for our framework. We derive a maximal error bound for deep nets that demonstrates that inclusion of prior knowledge results in its reduction. Furthermore, we show experimentally that known operators reduce the number of free parameters. We apply this approach to various tasks ranging from computed tomography image reconstruction over vessel segmentation to the derivation of previously unknown imaging algorithms. As such, the concept is widely applicable for many researchers in physics, imaging and signal processing. We assume that our analysis will support further investigation of known operators in other fields of physics, imaging and signal processing.

Prof. Andreas Maier was born on 26th of November 1980 in Erlangen. He studied Computer Science, graduated in 2005, and received his PhD in 2009. From 2005 to 2009 he was working at the Pattern Recognition Lab at the Computer Science Department of the University of Erlangen-Nuremberg. His major research subject was medical signal processing in speech data. In this period, he developed the first online speech intelligibility assessment tool – PEAKS – that has been used to analyze over 4.000 patient and control subjects so far.
From 2009 to 2010, he started working on flat-panel C-arm CT as post-doctoral fellow at the Radiological Sciences Laboratory in the Department of Radiology at the Stanford University. From 2011 to 2012 he joined Siemens Healthcare as innovation project manager and was responsible for reconstruction topics in the Angiography and X-ray business unit.
In 2012, he returned the University of Erlangen-Nuremberg as head of the Medical Reconstruction Group at the Pattern Recognition lab. In 2015 he became professor and head of the Pattern Recognition Lab. Since 2016, he is member of the steering committee of the European Time Machine Consortium. In 2018, he was awarded an ERC Synergy Grant “4D nanoscope”. Current research interests focuses on medical imaging, image and audio processing, digital humanities, and interpretable machine learning and the use of known operators.

Visual Relevance in New Sensors, Robots and Skill Assessment

Walterio Mayol-Cuevas
1st Dec 2021 [4pm]

In this talk I will cover recent and ongoing work at my group in Bristol University that looks at different aspects of relevance in visual perception. The field of Active Vision in the early days of Computer Vision already posed important questions about what should be sensed and how. In recent years these fundamental concerns are re-emerging as we want artificial systems to cope with the complexity and amount of information in natural tasks. With this motivation, I will present recent and ongoing work that looks at different aspects of how to sense and process the world from the point of view of novel visual sensors and their algorithms, robots interacting in close loop with users and computer vision methods for skill determination aimed at Augmented Reality guidance.

Walterio Mayol-Cuevas is full professor at the Computer Science Department University of Bristol in the UK and Principal Research Scientist at Amazon US. He received the B.Sc. degree from the National University of Mexico and the Ph.D. degree from the University of Oxford. His research with students and collaborators proposed some of the earliest versions of applications of visual simultaneous localization and mapping (SLAM) for robotics and augmented reality. And more recently, working on visual understanding for skill in video, new human-robot interaction metaphors and Computer Vision for Pixel Processor Arrays. He was General Co-Chair of BMVC 2013 and the General Chair of the IEEE ISMAR 2016. Topic editor of an upcoming Frontiers in Robotics and AI title for environmental mapping.

Object Recognition and Style Transfer: Two Sides of the Same Coin

Peter Hall
6th Oct 2021

Object recognition numbers among the most important problems in computer vision. State of the art is able to identify thousands of different kinds object classes, locating individual instances to pixel level in photographs with a reliability close to human.
Style transfer refers to a family of methods that makes artwork from an input photograph, in the style of an input style exemplar. Thus a portrait photograph may be re-rendered in the style of van Gough. It is an increasingly popular approach to manufacturing artwork in the creative sector.
But all object recognition systems suffer a drop in performance of up to 30% when artwork is used as input; things that are clearly visible to humans in artwork manage to evade detection. Style transfer fails to transfers all but the most trivial aspects of style – it is not just Cubism that cannot be emulated, the van Gough styles would convince nobody if presented as a forgery.
I will argue that these problems are related: that to make art any agent must understand the world visually, and that to understand the visual world robustly requires an abstract into a semantic form ripe for art making. Our prior work backs up these claims will be outlined, and the lessons learned will be put to use to advance both style transfer and object recognition, as well as support new applications in making photographic content available to people with visual impairments.

Peter Hall has been researching in the intersection of computer vision and computer graphics for more than twenty years. During that time he has developed algorithms for automatically emulating a wide range of a artwork given input photographs, including Cubism, Futurism, cave art, child art, Byzantine style art, and others. He has developed algorithms for robust object recognition, and was among the first to the interest in the “cross depiction problem”, which is recognition regardless of rendering style. Recent work includes learning artistic warps (eg Dali’s melting watches) and processing photographs into tactile art for people with visual impairment to have some access to content. Peter is professor of Visual Computer at the University of Bath, and director of a DTC: the Centre for Digital Entertainment. He has actively promoted the intersection of vision and graphics via networks, conference series, seminars etc. He has authored around 150 papers, co-authoring with many international colleagues including Art Historians.

Image Saliency Detection: From Convolutional Neural Network to Capsule Network

Jungong Han
24th Nov 2021

Human beings possess the innate ability to identify the most attractive regions or objects in an image. Salient object detection aims to imitate this ability by automatically identifying and segmenting the most attractive objects in an image. In this talk, I will share with you two recent works that we published in the top venues. In the first work, we showcase a guidance strategy for multi-level contextual information integration under the CNNs framework, while in the second work, we demonstrate how we carry out the saliency detection task using new Capsule Networks.

Jungong Han is a professor and the director of research of Computer Science at Aberystwyth University, UK. He also holds an Honorary Professorship at the University of Warwick. Han’s research interests span various topics of computer vision and video analytics, including object detection, tracking and recognition, human behavior analysis, and video semantic analysis. With his research students, he has published over 70 IEEE/ACM Transactions papers, and 19 conference papers from CVPR/ICCV/ECCV, NeurIPS, and ICML. His work has been well-received earning over 7900 citations and his H-index is 45 in Google scholar. He is the Associate Editor-in-Chief of Elsevier Neurocomputing, and the Associate Editor of several Computer Vision journals.