Co-Authored By:
Asked by: Leonardo Paeper
technology and computing data storage and warehousingIs Hdfs a data lake?
Hereof, is Elasticsearch a data lake?
A data lake is simply a place to park your data until you need it, and it could encompass HDFS (most common), object storage, NAS boxes, or anything else. Fundamentally, Elasticsearch is a tool for indexing data, not for the storage of data itself.
Also, why is it called a data lake?
Etymology. Pentaho CTO James Dixon is credited with coining the term "data lake". As he described it in his blog entry, "If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state.
Data lakes and data warehouses are both widely used for storing big data, but they are not interchangeable terms. A data lake is a vast pool of raw data, the purpose for which is not yet defined. A data warehouse is a repository for structured, filtered data that has already been processed for a specific purpose.