PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Long-term Longitudinal data collection and analysis in highly dynamic systems using mobile Crowd Sensing and mobile Agents

Challenges and Issues

PD Dr. Stefan Bosse

sbosse@uni-bremen.de

University of Bremen, Dept. Mathematics and Computer Science, Bremen, Germany

1 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Introduction

This work addresses longitudinal data collection and aggregation that can be used for:

  1. Data Analysis and Data Mining (statistical);
  2. Data- and Event-driven Simulation;
  3. Automated Prediction and Classification using Machine Learning (ML is kind of simulation = Extrapolation);
  4. Time-series analysis and prediction.
2 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Introduction

This work addresses longitudinal data collection and aggregation that can be used for:

  1. Data Analysis and Data Mining (statistical);
  2. Data- and Event-driven Simulation;
  3. Automated Prediction and Classification using Machine Learning (ML is kind of simulation = Extrapolation);
  4. Time-series analysis and prediction.

All four domains depend on the strength and statistical quality on the vertical and horizontal (longitudinal time) scale!

Incremental longitudinal data sampling is a challenge!

3 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Longitudinal Surveys

Typical applications of classical longitudinal surveys are (Lynn, 2009):

  • Surveys of businesses
  • Surveys of school-leavers, graduates or trainees
  • Household panel surveys
  • Birth cohort studies
  • Epidemiological studies
  • Social Networking
  • Socio-technical Systems
4 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Longitudinal Surveys

Typical applications of classical longitudinal surveys are (Lynn, 2009):

  • Surveys of businesses
  • Surveys of school-leavers, graduates or trainees
  • Household panel surveys
  • Birth cohort studies
  • Epidemiological studies
  • Social Networking
  • Socio-technical Systems

Surveys are typically participatory and rely on models and survey plans

5 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Longitudinal Surveys

Typical applications of classical longitudinal surveys are (Lynn, 2009):

  • Surveys of businesses
  • Surveys of school-leavers, graduates or trainees
  • Household panel surveys
  • Birth cohort studies
  • Epidemiological studies
  • Social Networking
  • Socio-technical Systems

Surveys are typically participatory and rely on models and survey plans

Crowdsensing is typically opportunistic and self-organizing

6 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Longitudinal Data Sampling

(a) Traditional survey-based data sampling and static modelling using pariticipatory mechanisms (b) Continuous crowdsensing based data-driven modelling using opportunistic mechanisms

7 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Longitudinal Data Sampling

Issues with Longitudinal Sampling:

  • Curse of Dimensionality of l. data (LD):

LD=P×O×L×V×t

with: P: Persons, O:Occasions, L:Locations/Places, V: Variables, t:time

8 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Longitudinal Data Sampling

Issues with Longitudinal Sampling:

  • Curse of Dimensionality of l. data (LD):

LD=P×O×L×V×t

with: P: Persons, O:Occasions, L:Locations/Places, V: Variables, t:time

  • Sampling in time space (horizontal axis)
    • periodically (polling);
    • event-based;
    • random.
  • Sampling in variable space (vertical axis)
    • Bias, Fraud,
    • Distortion, Noise, Failure
    • Missing data, impurity
9 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Errors in Longitudinal Data Sampling

Coverage Error

Sampling Error

Non-repsonse Error

Measurement Error

Lynn, 2009

10 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Errors in Longitudinal Data Sampling

Coverage Error

Sampling Error

Non-repsonse Error

Measurement Error

Lynn, 2009

(Mobile) Crowdsensing can help to reduce Coverage, Sampling, and Non-response Errors and to extend the data space with environmental/context sensor variables

11 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

On-line vs. Off-line Data Mining and Machine Learning

(a) Off-line Surveys (b) On-line Longitudinal Data Mining

12 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

On-line vs. Off-line Simulation

(a) Off-line data-driven ABS (b) On-line data- and event-driven ABS

13 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

The Concept

An Unified Approach: Agents connect Real World & Simulation

ABM
Agent-based Modelling.
ABS
Agent-based Simulation.
ABC
Agent-based Computation.
MCWS
Mobile agent-based Crowdsensing.
ML
Machine Learning.
SM
Surrogate Modelling.
14 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

The Concept

An Unified Approach: Agents connect Real World & Simulation

ABM
Agent-based Modelling.
ABS
Agent-based Simulation.
ABC
Agent-based Computation.
MCWS
Mobile agent-based Crowdsensing.
ML
Machine Learning.
SM
Surrogate Modelling.

Mobile Crowdsensing is: Event-driven or request-reply-based, uses mobile agents for sensor sampling (mobile devices) and performing micro surveys (dynamic/conditional scripts) via chat dialogs

15 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

The Concept

Unified Agent Methodology for longitudinal data mining, modelling of complex systems, and simulation

16 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Agent-based On-line Simulation: Software Architecture

Bosse, Engel, 2019, Sensors Two agent classes are used: Physical simulation agents (red) and computational software agents (blue, Simulation and Real World)

17 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Time Machine

ABC Crowdsensing can be used 1. To update simulations in real-time ⇒ Variance by Digital Twins, 2. Fork simulation runs with time-compressing speed-up, and 3. Creating simulation snapshots for future world evolution ⇒ Weather Forecast

18 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Virtual Sensors

  • Agents can pose the following roles:
    • Physical agents in simulation;
    • Computational agents performing crowd sensing (physical sensors)
    • Computational agents performing sensor aggregation, event detection, and data reduction (virtual sensors) ⇒ Longitudinal Data Sampling
19 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Virtual Sensors

  • Agents can pose the following roles:
    • Physical agents in simulation;
    • Computational agents performing crowd sensing (physical sensors)
    • Computational agents performing sensor aggregation, event detection, and data reduction (virtual sensors) ⇒ Longitudinal Data Sampling

Virtual Sensors implemented by mobile or stationary agents are central part of the longitudinal data sampling and data reduction methodology (including calibration)

20 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Three sensing domains: (Left) Physical Sensors (Middle) Virtual Sensors (Right) Data Mining/Application

21 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Use-case: Pandemic Simulation and Time-series Prediction

  1. Goal: Time-series prediction of dynamic of infection cases in pandemic sitations

  2. Methodologies:

22 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Use-case: Pandemic Simulation and Time-series Prediction

  1. Goal: Time-series prediction of dynamic of infection cases in pandemic sitations

  2. Methodologies:

  1. Data Mining of already existing institutional longitudinal data and Machine Learning (time-series extrapolation)
23 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Use-case: Pandemic Simulation and Time-series Prediction

  1. Goal: Time-series prediction of dynamic of infection cases in pandemic sitations

  2. Methodologies:

  1. Data Mining of already existing institutional longitudinal data and Machine Learning (time-series extrapolation)
  1. Surrogate Modelling of ABS using data from simulation and auxiliary data from mobile crowd sensing (crowd behaviour, decision making, opinions) / Simulation seed with data from a.
24 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Institutional Data Mining and Machine Prediction

  • Data from Robert Koch Institute (Weekly infection cases notifications)
  • Time-series prediction by LSTM-ANN
  • Biased and distorted/uncalibrated sensor data (unknown test sampling over time)

Crowd-driven Simulation and Surrogate Modelling

  • Domain-paritioned parameterised Gas Cellular Automata simulation
  • Time-series prediction by LSTM-ANN
  • Calibrated sensor data from simulation
25 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Use Case: Seggregation Simulation

  1. Goal: Study of seggregation effects (cluster groups) with individual (variant) behaviour based on mobility and social networking

  2. Methodologies:

26 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Use Case: Seggregation Simulation

  1. Goal: Study of seggregation effects (cluster groups) with individual (variant) behaviour based on mobility and social networking

  2. Methodologies:

  1. Agent-based Simulation with parameterised mobility and interaction models
27 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Use Case: Seggregation Simulation

  1. Goal: Study of seggregation effects (cluster groups) with individual (variant) behaviour based on mobility and social networking

  2. Methodologies:

  1. Agent-based Simulation with parameterised mobility and interaction models

  2. Agent-based Crowdsensing performing micro surveys via mobile devices and chat dialogues finally creating digital twins introducing behaviour model variance.

28 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Closed Simulation with static agent behaviour

  • Static seggregation behaviour model
  • Group A/B cluster formation

Open Simulation with On-line Crowdsensing and Digital Twins

  • Digital Twins introduce behaviour variance on micro level
  • Global clustering outcome differs! Bosse, Engel, SSC 2019
29 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Summary

Longitudinal data sampling and analysis is a challenge with respect to

  • Bias, distortion, sampling intervals, impure variables, missing calibration
  • Dimensionality and Data Volume

Agent-based methods with an unified agent model features:

  • Mobile Crowdsensing sampling environmental and user data on micro scale level
  • Tight coupling of simulation (ABS) with real world (human-in-the-loop)
  • Incremental data collection by software agents synchronises simulation with real world
  • Simulation snapshots and forking enables prediction of future world evolution
30 / 31

PD Stefan Bosse - Long-term Longitudinal data collection and analysis using Agents

Any Questions?

Long-term Longitudinal data collection and analysis in highly dynamic systems using mobile Crowd Sensing and mobile Agents: Challenges and Issues

Challenges and Pitfalls

PD Dr. Stefan Bosse

sbosse@uni-bremen.de, www.ag-0.de

University of Bremen, Dept. Mathematics and Computer Science, Bremen, Germany

31 / 31