ClouDigest: Amazon S3

Amazon S3

AWS S3 is a cloud-based storage service that provides scalable and secure object storage. It allows storing and retrieving any amount of data from anywhere on the web. With S3, data can be organized into buckets, acting as file containers with manageable access permissions. S3's high durability and availability make it ideal for storage use cases like backup, data archiving, static website hosting, content distribution, and data lakes.Additionally, It can easily be integrated with other AWS services.

ClouDatum:

- S3 bucket names must be globally unique.

- S3 provides automatic replication in another AZ ensuring data durability.

- S3 provides immediate read-after-write consistency for uploaded data.

S3 Data Consistency model:

S3 employs a strong read-after-write consistency model for all object operations, including PUT, GET, DELETE, and LIST. This means that after you successfully write an object to S3, you can immediately read the latest version of that object from any location.

All reads are consistent reads. This means that you can always be sure that you are reading the latest version of an object, regardless of where you are reading it from.

Operation	Consistency
PUT	Strong read-after-write
GET	Strong read-after-write
DELETE	Strong read-after-write
LIST	Strong read-after-write

This means that you can always be sure that you are reading the latest version of an object, regardless of where you are reading it from.

The only exception to this rule is when you are reading from a newly created bucket. In this case, there may be a short delay before the object is fully replicated to all S3 locations.

When you create a new bucket in S3, it takes some time for the object to be fully replicated to all of S3's locations. This is because S3 replicates objects across multiple locations for redundancy and availability.

It will not affect most applications. However, if your application requires that you always read the latest version of an object, you may want to wait a few minutes before reading from a newly created bucket.

Here is an example of how this might work:

1. You create a new bucket called "for-blog-my-bucket". ( The name should be unique, remember?)

2. You upload an object called "my-object" to the bucket.

3. S3 starts replicating the object to its different locations.

4. You immediately try to read the object from the bucket.

5. S3 may not be able to find the object in all locations yet, so it may return an error.

6. You wait a few minutes and try to read the object again.

7. S3 should now be able to find the object in all locations, so it will return the object.

As you can see, the delay in reading an object from a newly created bucket is only temporary. Once the object has been fully replicated to all locations, you will be able to read it without any problems.

Overall, the S3 consistency model is very strong. It provides you with the assurance that you are always reading the latest version of your data, even when you are reading from multiple locations. This makes it a very reliable storage option for a wide variety of applications.

S3 Storage Hierarchy:

Amazon Simple Storage Service (S3) stores data in a hierarchy of buckets, objects, and folders.

Buckets are the top-level containers in S3. They are named with a unique identifier and can store an unlimited number of objects.

Objects are the basic unit of storage in S3. They can be any type of file, including text, images, videos, and binary data. Objects are identified by a unique key.

Folders (also known as prefixes) are a logical way to organize objects within a bucket. They are created by using forward slashes ('/') in the object keys. For example, the object key images/product_images/product_1.jpg would be stored in the logical folder images/product_images.

It is important to note that S3 is an object storage system, not a traditional file system. This means that there are no actual directories or subdirectories in S3. The folders are simply a way to organize objects and make them easier to find.

So, there is no true hierarchy?

Yes, that's correct. There is no true hierarchy in Amazon S3. The folders that you create are simply a way to organize your objects and make them easier to find. They are not actual directories or subdirectories in the traditional sense.

For example, if you create a folder called images/product_images, this folder does not actually exist in S3. It is simply a way to refer to a group of objects that have the same prefix. In this case, all of the objects that start with the prefix images/product_images will be considered to be in the folder images/product_images.

ClouDatum:

- Each account can create a maximum of 100 buckets.

- The endpoint for the default N. Virginia region is in the format: s3.amazon.com. For buckets in other regions, the endpoint format is: s3<region name>.amazon.com.

- You can access a specific object by using its path in the URL.

- S3 bucket names must be globally unique and can only contain lowercase letters (a-z), numbers (0-9), hyphens (-), and periods (.).

S3 Object metadata:

S3 metadata is additional information attached to objects stored in Amazon S3, providing valuable insights and context.

Date and time: S3 object metadata includes the upload date and time.

Size: It also includes the size of the object, indicating its storage space.

Last modified: S3 tracks the last modification date and time of the object.

Server-side encryption: The metadata indicates whether server-side encryption is enabled for the object.

Version ID: S3 assigns a unique version ID to each object to track changes and revisions.

Delete marker: If the object is deleted, S3 adds a delete marker to indicate its removal.

Storage classes: S3 object metadata includes the storage class, which determines the object's durability, availability, and cost.

Using S3 metadata, users can enhance object management, streamline data operations, and make informed decisions regarding storage, security, and compliance requirements.

S3 Versioning:

Amazon S3 versioning allows you to keep multiple versions of the same object in a bucket. This can be helpful if you need to keep track of changes to an object over time, or if you need to restore a previous version of an object in case of accidental deletion.

To enable versioning on a bucket, you can use the AWS Management Console, the AWS CLI, or the AWS SDKs. Once versioning is enabled, any new object that you upload to the bucket will be stored in a new version. If you upload an object that already exists in the bucket, a new version of the object will be created with the updated content.

When you download an object from a bucket that has versioning enabled, you will always download the latest version of the object. You can also view and restore previous versions of objects from the Versioning page in the AWS Management Console.

To completely delete a versioned object, you need to delete it twice. First, you need to delete it from the front page of the bucket. This will remove the object from the list of objects in the bucket, but it will not actually delete the object from S3. The object will still be available in the Versioning page.

To permanently delete a versioned object, you need to delete it from the Versioning page. This will remove the object from S3 completely.

1. You upload an object called myfile.txt to a bucket called mybucket.

2. You then upload a new version of the object called myfile_v2.txt.

3. You delete myfile.txt from the front page of the bucket.

4. The object myfile.txt is still available in the Versioning page.

5. You delete myfile.txt from the Versioning page.

6. The object myfile.txt is permanently deleted from S3.

Storage classes in S3:

S3 Standard: The default storage class for frequently accessed data, offering high durability, availability, and low latency.

S3 Intelligent-Tiering: Automatically moves data between frequent and infrequent access tiers based on usage patterns, optimizing costs without sacrificing performance.

S3 Standard-IA (Infrequent Access): Ideal for data that is accessed less frequently but requires rapid access when needed, providing cost savings compared to S3 Standard.

S3 One Zone-IA: Similar to S3 Standard-IA but stores data in a single availability zone, offering cost savings for applications that do not require multiple zone redundancy.

S3 Glacier Instant Retrieval: For long-lived data that is rarely accessed and requires retrieval in milliseconds.

S3 Glacier Flexible Retrieval (Formerly S3 Glacier): For archive data that is accessed 1—2 times per year and is retrieved asynchronously.

S3 Glacier Deep Archive: For data that may be accessed once or twice in a year.

S3 Standard is Cheaper than S3 Glacier, but there is a catch!

S3 Standard is cheaper than S3 Glacier when storing temporary yet frequently accessed data, for short durations, typically a few hours or days.

For reproducible and less frequently accessed data, S3 Infrequent Access or S3 One-Zone IA tiers can be more cost-effective compared to using S3 Glacier, as Glacier incurs a minimum storage duration fee.

S3 Glacier Deep Archive has the highest minimum storage duration of 180 days. If data is stored for a shorter period and then changed or deleted, you still need to pay for the remaining days.

Archive retrieval fees should also be considered, especially for frequently accessed objects. Expedited retrievals are available but come with additional costs.

Overall, understanding the access patterns and lifecycle of your data can help you choose the most cost-efficient storage class in Amazon S3 based on your specific requirements.

Amazon S3 Lifecycle Management:

Amazon S3 LifeCycle Management is a feature that allows you to automatically move objects between different storage classes based on their age or access patterns. This can help you save money on storage costs by moving objects to less expensive storage classes when they are no longer frequently accessed.

You can use Life Cycle Management to move objects from:

- S3 Standard to any other storage class.

- Any storage class to S3 Glacier or S3 Glacier Deep Archive.

- S3 Glacier only to the S3 Glacier Deep Archive.

- You cannot move objects "backwards" in storage class. For example, you cannot move an object from S3

Standard-IA to S3 Standard.

Objects must be stored for at least 30 days before they can be moved to another storage class. You can also choose to expire or delete objects after a certain period of time.

You can apply Life Cycle Management rules to individual objects, folders, or entire buckets.

ClouDatum:

- Amazon S3 allows you to analyze up to 100 buckets at a time.

- You can analyze the entire contents of a bucket, or you can analyze specific objects or prefixes within a bucket.

- You can also filter your analysis by tags.

Following factors are considered during storage analysis:

- Data retrieved out.

- Percent of storage retrieved.

- Percent of storage infrequently accessed.

Amazon S3 Inventory: is a feature that allows you to generate a comprehensive list of object metadata, such as key (name), size, storage class, last modified date, and other relevant information. The inventory report can be scheduled to run at regular intervals, enabling you to automate the process of generating inventory reports for your S3 buckets. The report can be stored in another S3 bucket or delivered to a specified destination, such as Amazon S3, Amazon Glacier, or Amazon Redshift for further analysis or archiving.

Here are some additional details about Amazon S3 Inventory:

- The inventory report can be generated for a single bucket or for a group of buckets.

- You can specify the frequency of the inventory report, such as daily, weekly, or monthly.

- You can choose the format of the inventory report, such as CSV, ORC, or Parquet.

- You can specify the destination of the inventory report, such as another S3 bucket, Amazon Glacier, or Amazon

Redshift.

Amazon S3 Inventory can be a useful tool for a variety of purposes, such as Compliance Auditing, Data Analysis, Cost Optimisation, Data Migration.

Amazon S3 Cross-Region Replication (CRR): is a feature that allows you to automatically replicate objects in one S3 bucket to another S3 bucket in a different AWS Region. CRR is asynchronous, which means that it can take some time for objects to be replicated to the destination bucket.

Replicated objects cannot be replicated again to another region. This is because CRR only replicates objects that are in the source bucket. If you want to replicate an object that has already been replicated to another region, you will need to use a different replication method, such as S3 Same-Region Replication (SRR).

Also, this:

CRR is a free service.

CRR can replicate objects to any AWS Region that you have access to.

CRR can replicate objects in batches, which can help to improve performance.

CRR can replicate objects with different storage classes.

CRR can replicate objects with different tags.

Encryption in Amazon S3:

Amazon Simple Storage Service (S3) offers two types of encryption: server-side encryption (SSE) and client-side encryption (CSE).

Server-side encryption is encryption that is performed by Amazon S3. With SSE, the data is encrypted before it is stored in Amazon S3. This means that even if someone gains unauthorized access to the data, they will not be able to decrypt it without the encryption key.

There are three different methods of SSE:

SSE-S3 is a default feature that uses Amazon S3-managed keys. This is the simplest and most cost-effective way to encrypt your data.

SSE-KMS uses AWS Key Management Service (KMS). This gives you more control over the encryption keys, but it also adds an additional layer of complexity.

SSE-C allows you to provide your own encryption keys. This gives you the most control over the encryption keys, but it also requires you to manage the keys yourself.

Client-side encryption is encryption that is performed by the client before the data is sent to Amazon S3. With CSE, the client encrypts the data using their own encryption key. This means that Amazon S3 does not have access to the encryption key, so they cannot decrypt the data.

CSE can be done using either AWS KMS or your own encryption key.

The best encryption method for you will depend on your specific needs and requirements. If you are looking for a simple and cost-effective solution, then SSE-S3 is a good option. If you need more control over the encryption keys, then SSE-KMS or SSE-C may be a better choice. If you need to encrypt data that will be accessed by clients, then CSE is a good option.

Server access logging: is a feature of Amazon S3 that allows you to track access to your S3 buckets. You can manually turn on server access logging for individual S3 buckets or for all of your S3 buckets. Once you have enabled server access logging, the logs are stored in a separate S3 bucket. You can then use CloudWatch Logs to view and analyze the logs.

The logs that are generated by server access logging include information about each request that is made to your S3 bucket, including:

- The requester's IP address.

- The time of the request.

- The request method.

- The request URI.

- The response status code.

- You can also use server access logging to track other information, such as the user agent that was used to make

the request or the amount of data that was transferred.

Server access logging can be a valuable tool for auditing your S3 bucket access and troubleshooting problems. For example, you can use server access logs to identify unauthorized access to your S3 buckets or to track down the source of performance problems.

If you are looking for a way to track access to your S3 buckets, then you should consider enabling server access logging. It is a valuable tool for auditing your S3 bucket access and troubleshooting problems.

S3 Access Points and VPC Endpoints:

S3 Access Points are unique entry points to your S3 buckets that can be used to simplify permission management, apply fine-grained access control, and enhance the security, manageability, and scalability of your S3 bucket access control.

S3 Access Points can be used to connect EC2 instances in private subnets with no internet access to S3. This is done by creating a VPC endpoint for S3 that is associated with the private subnet.

When an EC2 instance in a private subnet with no internet access makes a request to an S3 Access Point, the request is routed to the VPC endpoint instead of the public internet. This ensures that the request is only accessible to authorized users and that the data is transferred securely.

Here are some of the key benefits of using S3 Access Points:

- Simplified permission management: You can create a separate access policy for each S3 Access Point, which makes it easier to control who can access the associated bucket and what actions they can perform.

- Fine-grained access control: You can define access policies for each S3 Access Point to specify allowed actions (e.g., read, write, delete) and conditions based on factors like IP address, time of day, or encryption requirements. This enables you to apply more granular access control at the access point level.

- Improved security: S3 Access Points can be used to connect EC2 instances in private subnets with no internet access to S3. This helps to improve the security of your data by preventing unauthorized access.

- Enhanced scalability: S3 Access Points can be used to improve the scalability of your S3 bucket access control. This is because you can create multiple S3 Access Points for a single bucket, which can help to distribute traffic and improve performance.

Amazon S3 is a powerful and versatile object storage service that offers scalable and secure storage for your data. With features like S3 Access Points, you can simplify permission management, apply fine-grained access control, and enhance security for your S3 buckets. Whether you need to store and retrieve large amounts of data, host static websites, or build data-intensive applications, Amazon S3 provides a reliable and flexible solution for your storage needs.

You can also find more information about S3 on the AWS website here.

Start hosting your static website on S3 today! Refer Here.

Search This Blog

ClouDiaries With Anshul

ClouDigest: Amazon S3

Comments

Popular posts from this blog

ClouDebrief: Database Savings Plans

ClouDesign: Architecting Serverless

ClouDIY: Bootstrapping Linux Servers – Part 2