Data Visualization Paper 2023

Question 1

a) With regards to data visualization, briefly explain the following. Use one example for each part.

i. Have a human in the loop.

Computer-based visualization systems are designed to help people perform tasks more effectively. Visualisation is most effective when it is used to augment human capabilities, not to replace them with computational decision-making. For example, visualisation would not be needed if a fully automatic solution existed and was trusted.

ii. Use an external representation.

External representations in data visualisation aim to replace cognition with perception. An example of this is Cerebral, a visualisation system that helps users understand the relationships between genes and experimental conditions.

iii. Depend on vision.

The human visual system is a high-bandwidth channel to the brain, enabling rapid processing and providing an overview of the information. Vision gives us a subjective experience of seeing everything at the same time. Sound, touch, taste, and smell, on the other hand, have a lower bandwidth and do not support an overview in the same way.

iv. Showing the data in detail

Showing the data in detail is important because summaries can lose information. For example, Anscombe's Quartet shows four datasets with identical statistics, but very different distributions when visualized. This demonstrates that details matter for confirming expected patterns, finding unexpected ones, and assessing the validity of statistical models.

b) Validation for a visual design is considered to be difficult. Discuss this statement with the use of an example.

It is challenging to validate a visual design because there are many ways to get it wrong at each level of the design process. For example, using computational benchmarks to confirm idiom design would be a mismatch as it does not address the effectiveness of the visual representation for human understanding.

c) Introduce the 3-part analysis framework for visualization design. Use examples where necessary.

What, Why, and How. This framework serves as a guide for understanding the essential components and processes involved in data visualization.

What: This stage focuses on identifying the data that will be used in the visualization. It involves understanding the types of data available (e.g., categorical, quantitative) and their attributes. For example, if you are visualizing sales performance, "What" involves specifying metrics like sales volume, revenue, and profit, and understanding their types and structure.
Why: Here, you determine the purpose behind creating the visualization. This involves understanding the goals and needs of the end-users. For instance, if the goal is to track sales performance, the "Why" would include reasons such as monitoring team effectiveness or evaluating new product sales. This stage helps to clarify the intended outcomes and informs design decisions.
How: This final stage addresses the methods and design choices for building the visualization. It involves deciding on encoding techniques, interactivity, and layout. For instance, you might choose a bar chart to compare sales volumes or a line graph to show revenue trends over time. This stage is where you translate the data and purpose into a visual format.

Question 2

a) Nested Model of Visualization Design and Validation

The nested model of visualization design and validation helps analyse the structure of visualization idioms. It consists of four levels: Domain Situation, Abstraction, Idiom, and Algorithm.

i. Domain Situation

The domain situation refers to the target users, their questions, and the data within the application domain. It is important to characterise the domain specifically enough to gain traction for the visualization design. For example, consider a group of scientists studying climate change. Their domain questions might involve understanding trends in global temperature or identifying regions most vulnerable to rising sea levels.

ii. Data and Task Abstraction

This level translates domain specifics into a generalized visualization vocabulary.

Data abstraction identifies the dataset and attribute types and considers data transformations based on the task. For the climate scientists, data abstraction might involve representing global temperature data as a time series or mapping geographical regions with color based on their vulnerability scores.
Task abstraction focuses on what users aim to achieve with the visualization. It identifies user tasks and matches them to suitable data types. For example, the scientists might need to discover temperature distributions over time, compare temperature trends across regions, or locate geographical areas with the highest outliers in sea-level rise.

iii. Visual Encoding / Interaction Idiom

This level focuses on "how" the data is shown, using visual encoding and interaction idioms.

Visual encoding leverages marks and channels. Marks are geometric primitives like points, lines, or areas representing items or links in the data. Channels control the appearance of marks based on attributes, such as color, size, and shape. For the climate scientists, visual encoding could involve using a line chart with color representing different regions' temperatures over time.
Interaction idioms determine how users can manipulate the visualization to gain insights. Examples include selecting data points to get more details, filtering data based on specific criteria, or zooming in on specific regions of interest. In the climate change example, the scientists might use a slider to change the year and observe temperature changes on a map or select a specific country to see its temperature trend over time.

iv. Algorithm

This level ensures the visualization's computational efficiency. It involves developing algorithms for data processing, visual mapping, and rendering the visualization. Efficient algorithms ensure smooth interaction and responsiveness, especially with large datasets. For example, for the climate scientists, algorithms might be necessary to process and display real-time data from multiple weather stations worldwide or to enable interactive exploration of large climate models.

b) Marks and Channels in Visual Encoding

Marks are basic geometric elements used to represent items or links in data. Channels control the appearance of these marks based on associated attributes. Consider Figure 2.1 as an example.

Marks: The graph uses points (dots) as marks to represent different countries.
Channels: It employs various channels to encode information:
- Horizontal position represents the percentage of people who believe vaccines are safe.
- Vertical position separates and aligns countries based on their global region.
- Color (in the coloured version) differentiates the six global regions.

Therefore, marks and channels are the building blocks of a visualization, allowing for the representation of complex data in an understandable format.

c) Comments on the Graph (Figure 2.1)

Figure 2.1 visually represents the percentage of people who believe vaccines are safe across various countries and global regions. While it effectively communicates some information, some issues impact its overall effectiveness.

Strengths:

Easy Comparison within Regions: By vertically aligning countries within their respective regions, the graph allows for quick comparisons of vaccine safety perception within those regions. For example, it's clear that Northern European countries generally display higher trust in vaccine safety compared to Eastern European nations.
Regional Medians: The dark vertical lines marking region medians offer a good visual summary of the overall perception within each region.

Issues:

Lack of Clear Regional Ordering: Although countries are grouped by region, the regions themselves lack a clear ordering principle. This makes it difficult to discern global patterns or trends easily. For instance, it's not immediately apparent if there's a trend of higher or lower trust in vaccine safety as we move across different regions.
Overlapping Country Dots: The dots representing individual countries within regions often overlap, especially in regions with many countries. This overlapping hinders the identification of specific countries and their corresponding data points. For example, in the Americas region, it's difficult to distinguish the United States from other countries.
Limited Insight into Country-Specific Information: While the graph is effective in providing a regional overview, it lacks details about individual countries. Displaying country names or using interactive features, like tooltips that appear on hover, could address this limitation. Tooltips, for instance, could reveal the exact percentage and country name when hovering over a specific dot.
Color Considerations: It's crucial to consider if the chosen colours effectively differentiate the regions and if they are accessible to individuals with colour vision deficiencies.

Question 3

a) Compare and contrast the visual representations of network data using node-link diagrams and adjacency matrix representations.

Node-link diagrams represent nodes as point marks and links as line marks that connect the nodes. They are intuitive, familiar, and the most common type of network visualization. However, node-link diagrams can become cluttered and difficult to interpret for large or dense networks.

Adjacency matrix representations, on the other hand, use a matrix to represent the relationships between nodes. The rows and columns of the matrix represent the nodes, and the cells indicate the presence or absence of a link between the corresponding nodes. Adjacency matrices are particularly useful for identifying clusters and patterns in the data, especially for large networks. Unlike node-link diagrams, they are not as intuitive and might require some training to interpret.

Here's a table summarizing the comparison:

Feature	Node-link Diagrams	Adjacency Matrix
Intuitiveness	High	Low
Scalability	Better for small/sparse networks	Better for large networks
Cluster Identification	Can be difficult for dense networks	Easier, especially with reordering
Path Tracing	Easier	Difficult

b) What is the most suitable spatial visualization for each of the following scenarios?

i. Represent the distribution of resources, such as land area or natural reserves.

Choropleth Map: This is a suitable choice for representing data by geographic regions. The land area or natural reserves can be represented as regions, and the color intensity or hue can be used to represent the distribution of resources.

ii. Represent population density variations in Sri Lankan districts.

Choropleth Map: This is a suitable option for representing population density variations by district. Each district can be a region, and color can be used to represent the population density. Ensure normalization to avoid misrepresentation.

iii. Represent different types of crime incidents, such as burglaries or thefts, for different neighbourhoods of Colombo.

Symbol Map: Symbol maps are useful for representing data tied to specific locations. Different symbols can be used for different types of crime incidents, and the size of the symbols can represent the frequency of each crime type in a particular neighbourhood.

iv. Represent the racial/ethnic composition in divisional secretariats of Western province.

Stacked Bar Chart: While this is not strictly a spatial visualization, a stacked bar chart can effectively represent the racial/ethnic composition within each divisional secretariat. Each bar can represent a secretariat, and segments within the bar can represent the proportion of each racial/ethnic group.

v. Represent the migration patterns of the people among different countries.

Flow Map: A flow map is generally considered suitable for representing migration patterns. Countries can be represented as nodes, and the thickness of the lines connecting them can represent the volume of migration between those countries.

c) The following is a dataset of power and water usage for four buildings at the University of Moratuwa. You are to plan a visualization of this data for the vice-chancellor to provide insight into the habits of the university.

i. Sketch of the Visualization:

A grouped bar chart is proposed to represent the data.

X-Axis: Building Name (Sumanadasa, ENTC, Library, Faculty of IT)
Y-Axis: Resource Usage (With a shared scale for both Power (kW) and Water (ML)). The Y-axis should start at 0.
Bars: Two bars for each building, one representing "On-Peak" usage and the other "Off-Peak" usage.
Color: Distinct colors for "On-Peak" (e.g., darker shade) and "Off-Peak" (e.g., lighter shade) bars.

ii. Marks and Channels:

Marks:
- Lines (for bars)
Channels:
- Spatial Regions: One for each building, separated horizontally and aligned vertically.
- Length: To express the quantitative value of resource usage (Power and Water).
- Color Hue: To differentiate between "On-Peak" and "Off-Peak" usage.

iii. Justification of Design Decisions:

Grouped Bar Chart: A grouped bar chart effectively allows comparison of "On-Peak" and "Off-Peak" resource usage for each building. The horizontal arrangement facilitates comparison across buildings, while the vertical alignment and shared scale for Power and Water make it easy to compare the two resource usages.
Clear Axis Labeling: Labeling the axes enhances clarity and understanding.
Color to Differentiate: Using distinct colors for "On-Peak" and "Off-Peak" bars enhances the visual separation and makes it easier to perceive the differences.
Starting Y-axis at 0: This ensures the representation of data maintains integrity and avoids misleading visual exaggeration of differences.
Maximizing Data-Ink Ratio: The design focuses on representing the data clearly. Gridlines are not necessary for this visualization and can be removed to maximize the data-ink ratio.

Question 4

a) Name four pre-attentive attributes.

The sources list the following pre-attentive attributes:

Color Intensity
Color Hue
Line Width
Enclosure
Size
Added Mark
3D depth
Conjunction of blur and color hue
Focus and blur
Conjunction of shape and depth

b) What do you think about the Data Ink Ratio and Data Density of the above visualization? Explain your answer. Suggest 4 changes to the above visualization to increase its Data Ink Ratio and Data Density.

The data ink ratio is defined as the ratio of ink used to display data to the total ink used in the graphic. The data density is defined as the proportion of the total size of the graph that is dedicated to displaying data.

The provided bar chart in Figure 4.1 has a low data-ink ratio because it uses a lot of ink for elements that are not data, such as the gridlines and the background. It also has a low data density because a large proportion of the chart is not used to display data.

Here are four changes to increase the data-ink ratio and data density of the chart:

Remove the gridlines: Gridlines can be helpful for reading the chart, but they are not essential. Removing them would increase the data-ink ratio and data density.
Remove the background: The background does not convey any data and can be removed.
Use a smaller font size: The font size for the axis labels and the title could be smaller.
Combine the title and the axis label: Instead of having a separate title and axis label, they could be combined into a single label.

c) Discuss the advantages and disadvantages of unconstrained and constrained navigation techniques, providing examples to support your answer.

Unconstrained navigation gives the user free control over the viewpoint but can be difficult to control, making it easy to overshoot or undershoot the desired location. An example of unconstrained navigation is the zoom function on a map. Constrained navigation, on the other hand, guides the user along a predefined path, which is typically an animated transition. An example of constrained navigation is clicking on a point of interest on a map, which triggers the map to automatically zoom and pan to that location.

d) Consider juxtaposing views and animating transitions in an interactive visualization. When would each approach be more suitable? Discuss the complexity of changes, the number of frames, and the cognitive load imposed on the user.

Juxtaposing views, or placing views side-by-side, is useful for comparing data across those views. This reduces the cognitive load on the user because they can simply move their eyes between views, as opposed to recalling the information from a previous view. However, juxtaposing views uses more screen space than a single view. Animating transitions smoothly transitions a single view from one state to another. Animated transitions excel at showing a small number of changes over a small number of frames. However, when the complexity of the transition increases, either due to a large number of changes or a large number of frames, animated transitions become difficult to follow, making juxtaposed views a better choice. For example, small multiples are preferable to animation when visualising the same gene across different experimental conditions.

Future Tech Feed