17
2 minute read ∼ Filed in : A paper noteIntroduction
This paper provides some basic background of different concepts and implementation aspects involved in data storage, cloud usage, AI pipelines.
Four essential techniques that can help drive AI applications forward are data storage, cloud usage, AI pipeline implementation and computation strategy.
Data Storage
Biobank is an “Organised database of medical images
and associated imaging biomarkers (radiology and beyond) shared
among multiple researchers and linked to other bio-repositories”.
THe BioBank dataset can be used in Quantitative Imaging Network (QIN) system:
Challenges and considerations of data:
- Label group truth.
- The metadata of dataset (provenance etc) is also important.
Cloud Usage
Characteristics
On-demand self-service: user can deploy self-defined services and they can automatically expend.
Resource polling: cloud provider can dynamically assign or schedule resources to multiple customers
Repid elasticity: ability of on-demand auto-scaling.
Service Model
Cloud can provide services like PaaS, MLaaS, IaaS etc.
Deployment Model
Private cloud: Exclusive cloud for one organization.
Community cloud: the use is not exclusive to one organization but to multiple organizations.
Public cloud: open to public.
AI PipeLine
Support whole process form data collection to deployment.
Local Implementation
Networking
The pipeline consist of local storege system, processors, deployment system, and one big challenge is to minilize communicaiton latency
of those components. Solutions like:
- Reducing model precision by storing single floating point or less rather than double precision.
- Perform compression is to limit the values of gradient updates to binary values.
Data Management
Data preparation, storage, processing and exchange between systems.
AI Models
ONNX enables various DL frameworks to share their model properties and co-operate within the same network.
Cloud Implementation
Online annotation, Online training,
Distributed/Federated Learning
Methods
Data Parallelism / model Parallelism
Synchronization Training
May suffer from slow worker.
Bounded synchronous training
Using stale parameters to train may reduce the performace, so it needs to add a bound to it
System architectures
- Centralized parameter server: slow worker problem, single node failure.
- Decentralized architecture: communication is high but more robust to failure.
- Federated learning architecture: secure parameter transmission, Encryption.
- Sequence model training: a model is trained on data from one insitution and then adapated to new institutions.