ICT researcher Bodhiswatta Chatterjee’s journal paper “Semantic Segmentation from Remote Sensor Data and the Exploitation of Latent Learning for Classification of Auxiliary Tasks” is published in Computer Vision and Image Understanding, 2021. The paper is co-authored by Chatterjee B., Poullis C.
Abstract: In this paper, we address three different aspects of semantic segmentation from remote sensor data using deep neural networks. Firstly, we focus on the semantic segmentation of buildings from remote sensor data and propose ICT-Net: a novel network with the underlying architecture of a fully convolutional network, infused with feature re-calibrated Dense blocks at each layer. Secondly, as the building classification is typically the first step of the reconstruction process, we investigate the relationship of the classification accuracy to the reconstruction accuracy. Finally, we present the simple yet compelling concept of latent learning and the implications it carries within the context of deep learning. We posit that a network trained on a primary task (i.e. building classification) is unintentionally learning about auxiliary tasks (e.g. the classification of road, tree, etc) which are complementary to the primary task. We present the results of our experiments and explain how knowledge about auxiliary and complementary tasks – for which the network was never trained – can be retrieved and utilized for further classification. The source code and supplemental material is publicly available at http://www.theICTlab.org/lp/2020ICTNet/
ICT lab researcher Mohsen Parisay’s journal paper “EyeTAP: A Novel Technique using Voice Inputs to Address the Midas Touch Problem for Gaze-based Interactions” is published in the International Journal of Human-Computer Studies, Elsevier, 2021. The paper is co-authored with Parisay M., Poullis C., Kersten M.
ICT lab researcher Mohsen Parisay will be presenting our work “IDEA: Index of Difficulty for Eye Tracking Applications. An Analysis Model for Target Selection Tasks”. The work was co-authored by Mohsen Parisay, Charalambos Poullis, and Marta Kersten.
Fitts’ law is a prediction model to measure the difficulty level of target selection for pointing devices. However, emerging devices and interaction techniques require more flexible parameters to adopt the original Fitts’ law to new circumstances and case scenarios. We propose Index of Difficulty for Eye tracking Applications (IDEA) which integrates Fitts’ law with users’ feedback from the NASA TLX to measure the difficulty of target selection. The COVID-19 pandemic has shown the necessity of contact-free interactions on public and shared devices, thus in this work, we aim to propose a model for evaluating contact-free interaction techniques, which can accurately measure the difficulty of eye tracking applications and can be adapted to children, users with disabilities, and elderly without requiring the acquisition of physiological sensory data. We tested the IDEA model using data from a three-part user study with 33 participants that compared two eye tracking selection techniques, dwell-time, and a multi-modal eye tracking technique using voice commands.
ICT lab researcher Yashas Joshi will be presenting our work ” Dynamic Foveated Rendering for Redirected Walking in Virtual Reality” as a poster at SIGGRAPH 2020. The work was co-authored by Yashas Joshi and Charalambos Poullis.
We present a novel redirected walking technique for virtual reality leveraging dynamic foveated rendering and a psychological phenomenon of inattentional blindness. Extensive testing with three user-studies showed that the technique can handle long straight-walks in the virtual environment (maximum recorded = 103.9m, in 4×4 sq.m. physical-space) and works in real-time.
ICT lab researcher Mohsen Parisay will be presenting our work “FELiX: Fixation-based Eye Fatigue Load Index A Multi-factor Measure for Gaze-based Interactions” at the International Conference on Human System Interaction, 2020. The work is co-authored by M. Parisay, C. Poullis, M. Kersten.
🥇 The paper is the recipient of the Best Paper Finalist Award.
Eye fatigue is a common challenge in eye tracking applications caused by physical and/or mental triggers. Its impact should be analyzed in eye tracking applications, especially for the dwell-time method. As emerging interaction techniques become more sophisticated, their impacts should be analyzed based on various aspects. We propose a novel compound measure for gaze-based interaction techniques that integrates subjective NASA TLX scores with objective measurements of eye movement fixation points. The measure includes two variations depending on the importance of (a) performance, and (b) accuracy, for measuring potential eye fatigue for eye tracking interactions. These variations enable researchers to compare eye tracking techniques on different criteria. We evaluated our measure in two user studies with 33 participants and report on the results of comparing dwell-time and gaze-based selection using voice recognition techniques.
The work on “Inattentional Blindness for Redirected Walking Using Dynamic Foveated Rendering” has been accepted for publication in IEEE Access 2020. The work is co-authored by Yashas Joshi and Charalambos Poullis.
Redirected walking is a Virtual Reality (VR) locomotion technique which enables users to navigate virtual environments (VEs) that are spatially larger than the available physical tracked space. In this work we present a novel technique for redirected walking in VR based on the psychological phenomenon of inattentional blindness. Based on the user’s visual fixation points we divide the user’s view into zones. Spatially-varying rotations are applied according to the zone’s importance and are rendered using foveated rendering. Our technique is real-time and applicable to small and large physical spaces. Furthermore, the proposed technique does not require the use of stimulated saccades but rather takes advantage of naturally occurring saccades and blinks for a complete refresh of the framebuffer. We performed extensive testing and present the analysis of the results of three user studies conducted for the evaluation.
ICT lab researcher will be presenting our work “Portal to knowledge: A Virtual Library Using Markerless Augmented Reality System for Mobile Device” at SPIE AR|VR|MR 2020. The work is co-authored by Yashas Joshi and Charalambos Poullis.
Since exceedingly efficient hand-held devices became readily available to the world, while not being a relatively recent topic, Augmented Reality (AR) has rapidly become one of the most prominent research subjects. These robust devices could compute copious amounts of data in a mere blink of an eye. Making it feasible to overlap computer generated, interactive, graphics over the real world images in real-time to enhance the comprehensive immersive experience of the user. In this paper, we present a novel mobile application which allows the users to explore and interact with a virtual library in their physical space using marker-less AR. Digital versions of books are represented by 3D book objects on bookcases similar to an actual library. Using an in-app gaze controller, the user’s gaze is tracked and mapped into the virtual library. This allows the users to select (via gaze) a digital version of any book and download it for their perusal. To complement the immersive user experience, a continuity is maintained using the concept of Portals while making any transition from AR to immersive VR or vice-versa, corresponding to transitioning from a “physical” to a virtual space. The use of portals makes these transitions simple and seamless for the user. The presented application was implemented using Google AR Core SDK and Unity 3D, and will serve as a handy tool to spawn a virtual library anytime and anywhere, giving the user an imminent mixed sense of being in an actual traditional library while having the digital version of any book on the go.
ICT lab researcher Majid Pourmemar will be presenting our work “Visualizing and Interacting with Hierarchical Menus in Immersive Augmented Reality”. The work is co-authored with Majid Pourmemar and Charalambos Poullis.
🥇 Majid, who will be presenting the work at the conference, received the “VRCAI 2019 Diversity and Inclusion Scholarship” sponsored by Disney Research.
Abstract: Graphical User Interfaces (GUIs) have long been used as a way to inform the user of the large number of available actions and options. GUIs in desktop applications traditionally appear in the form of two-dimensional hierarchical menus due to the limited screen real estate, the spatial restrictions imposed by the hardware e.g. 2D, and the available input modalities e.g. mouse/keyboard point-and-click, touch, dwell-time etc. In immersive Augmented Reality (AR), there are no such restrictions and the available input modalities are different (i.e. hand gestures, head pointing or voice recognition), yet the majority of the applications in AR still use the same type of GUIs as with desktop applications. In this paper we focus on identifying the most efficient combination of (hierarchical menu type, input modality) to use in immersive applications using AR headsets. We report on the results of a within-subjects study with 25 participants who performed a number of tasks using four combinations of the most popular hierarchical menu types with the most popular input modalities in AR, namely: (drop-down menu, hand gestures), (drop-down menu, voice), (radial menu, hand gestures), and (radial menu, head pointing). Results show that the majority of the participants (60%, 15) achieved a faster performance using the hierarchical radial menu with head pointing control. Furthermore, the participants clearly indicated the radial menu with head pointing control as the most preferred interaction technique due to the limited physical demand as opposed to the current de facto interaction technique in AR i.e. hand gestures, which after prolonged use becomes physically demanding leading to arm fatigue known as ’Gorilla arms’.
ICT lab researcher Pinjing Xu will be presenting our work “Delineation of Road Networks Using Deep Residual Neural Networks and Iterative Hough Transform”. The work is co-authored with Pinjing Xu and Charalambos Poullis.
In this paper we present a complete pipeline for extracting road network vector data from satellite RGB orthophotos of urban areas. Firstly, a network based on the SegNeXt architecture with a novel loss function is employed for the semantic segmentation of the roads. Results show that the proposed network produces on average better results than other state-of-the-art semantic segmentation techniques. Secondly, we propose a fast post-processing technique for vectorizing the rasterized segmentation result, removing erroneous lines, and refining the road network. The result is a set of vectors representing the road network. We have extensively tested the proposed pipeline and provide quantitative and qualitative comparisons with other state-of-the-art based on a number of known metrics.
The source code and supplemental material is publicly available at https://theICTlab.org/lp/2019Re_X/
Join us at the 16th Conference on Computer and Robot Vision, 2019
ICT lab researcher Bodhiswatta Chatterjee will be presenting our work “On Building Classification from Remote Sensor Imagery Using Deep Neural Networks and the Relation Between Classification and Reconstruction Accuracy Using Border Localization as Proxy”. The work is co-authored with Bodhiswatta Chatterjee and Charalambos Poullis.
Abstract: Convolutional neural networks have been shown to have a very high accuracy when applied to certain visual tasks and in particular semantic segmentation. In this paper we address the problem of semantic segmentation of buildings from remote sensor imagery. We present ICT-Net: a novel network with the underlying architecture of a fully convolutional network, infused with feature re-calibrated Dense blocks at each layer. Uniquely, the proposed network combines the localization accuracy and use of context of the U-Net network architecture, the compact internal representations and reduced feature redundancy of the Dense blocks, and the dynamic channel-wise feature re-weighting of the Squeeze-and-Excitation(SE) blocks. The proposed network has been tested on INRIA’s benchmark dataset and is shown to outperform all other state-of-the-art by more than 1.5% on the Jaccard index.
Furthermore, as the building classification is typically the first step of the reconstruction process, in the latter part of the paper we investigate the relationship of the classification accuracy to the reconstruction accuracy. A comparative quantitative analysis of reconstruction accuracies corresponding to different classification accuracies confirms the strong correlation between the two. We present the results which show a consistent and considerable reduction in the reconstruction accuracy. The source code and supplemental material is publicly available at https://www.theICTlab.org/lp/2019ICTNet/.