Dataframes and Lists / Data Types / Docker Builds / PSQL
The code goes back and forth between dataframes and lists. This must be ineffecient.
There are multiple lists. Elements at the same index in each list are a set.
If I remove an element from a list, the index of every subsequent element changes.
Current solution: Make every list a column in a dataframe. Remove a row in a dataframe. Then the removal is concurrent.
Why not just stick to dataframes? I do some math manipulation on all elements in a list. This might be faster than operating on every element in a column in a dataframe. I haven’t checked.
Scaling up with VMs, docker containers inside VMs, and VMs managed by an orchestration software.
After creating a dockerfile, use docker build /path/to/dockerfile

Getting rid of the comment in the EXPOSE line resolved the error, i.e. syntax error. The docker file was used to build an image. The image size 563 Mb.

Another image was used to launch a container interactively in the shell, use flat -it for ‘interactive terminal’ and specify that /bin/bash is the command to be executed. A python script successfully runs inside the docker container, demo’d by printing hello.

Since nginx was installed, the container has a default dir structure: /var/www/html and an index.html file at this location.

at /bin ls shows the following psql files

The user installed programs are not /bin , instead they are at /usr/lib

If modify config file, restart the service for changes to take effect:
service postgresql restart
Authentication of a user can be via md5, via peer, or via scram-sha-256.

Switch to a user called postgres. Now we are in a VM, and then in a docker container, and able to access a db.
su — postgres [there is a space b/w the su and the dash and postgres, medium reformats]
When the docker container is started using an image, the postgresql server is installed because it was in the dockerfile. Must start the service:
service postgresql start

Entering and exiting the psql command line has two relevant commands: psql and \q . See the change in the command line.

Created 3 tables: bikes, planets; customers.

When writing sql queries to join tables, the order of tables specified by ON does not affect the result.

When adding a new column, it is necessary to declare the name and the data type for the values in the column.

Join more than 2 tables, as many as needed, by using multiple joins. The query below has 2 joins and aggregates data from 3 tables: customers, bikes, planets based on aligning the column ‘name’ from each of those tables. Resulting table has some redundancy, such as the column ‘name’ is repeated 3 times.

The results of a query can be saved as a new table or inserted into another table. Three Rows from ‘bikes’ are inserted into ‘cheap_bikes’ based on a price condition. Maya, Jill, and Jen have low price bikes.

The Cheap_Bikes table already had some information, such as about Maya’s bike, and now there is a repetition of data.

Types supported by PSQL:
1. Numeric Types
- Integer Types:
SMALLINT
: 2-byte integer, range: -32,768 to +32,767INTEGER
orINT
: 4-byte integer, range: -2,147,483,648 to +2,147,483,647BIGINT
: 8-byte integer, range: -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807- Serial Types (Auto-increment):
SMALLSERIAL
: 2-byte auto-incrementing integerSERIAL
: 4-byte auto-incrementing integerBIGSERIAL
: 8-byte auto-incrementing integer- Decimal Types:
NUMERIC(precision, scale)
: User-specified precision, exact decimal values, useful for monetary amounts.DECIMAL(precision, scale)
: Synonym forNUMERIC
.- Floating-Point Types:
REAL
: 4-byte floating-point number.DOUBLE PRECISION
: 8-byte floating-point number.
2. Monetary Types
MONEY
: Represents currency values with fixed fractional precision, formatted according to locale.
3. Character Types
CHAR(n)
: Fixed-length character, padded with spaces if necessary.VARCHAR(n)
: Variable-length character with a specified limit.TEXT
: Variable-length character with no limit, useful for large text blocks.
4. Binary Data Types
BYTEA
: Stores binary data, useful for files, images, or other binary content.
5. Date and Time Types
DATE
: Calendar date (year, month, day).TIME [WITHOUT TIME ZONE]
: Time of day without time zone.TIME WITH TIME ZONE
: Time of day with time zone.TIMESTAMP [WITHOUT TIME ZONE]
: Date and time without time zone.TIMESTAMP WITH TIME ZONE
(aliasTIMESTAMPTZ
): Date and time with time zone.INTERVAL
: Represents a span of time (e.g., days, hours, minutes).
6. Boolean Type
BOOLEAN
: StoresTRUE
,FALSE
, orNULL
.
7. Enumerated Types
ENUM
: User-defined list of allowed values (e.g.,ENUM('small', 'medium', 'large')
).
8. Geometric Types
POINT
: Stores a geometric point (x, y).LINE
: Stores a geometric line.LSEG
: Stores a line segment.BOX
: Stores a rectangular box.PATH
: Stores a path (open or closed).POLYGON
: Stores a closed polygon.CIRCLE
: Stores a circle.
9. Network Address Types
- Network Types:
CIDR
: IPv4 or IPv6 network address.INET
: IPv4 or IPv6 host address.MACADDR
: MAC address.
10. UUID Type
UUID
: Stores universally unique identifiers (UUIDs), often used as primary keys.
11. JSON Types
JSON
: Stores JSON data in plain text format.JSONB
: Stores JSON data in a binary format, allowing faster access and indexing.
12. Array Types
- Supports arrays of any data type (e.g.,
INTEGER[]
,TEXT[]
), useful for storing multiple values in a single column.
13. Range Types
INT4RANGE
: Range of integer values.NUMRANGE
: Range of numeric values.TSRANGE
: Range of timestamps (without time zone).TSTZRANGE
: Range of timestamps (with time zone).DATERANGE
: Range of dates.
14. Composite Types
- Allows defining custom data types that group multiple fields (similar to a struct in other languages).
15. Special Types
TSVECTOR
: Used for full-text search.TSQUERY
: Used to define a text search query.XML
: Stores XML data.VOID
: Represents a void return type, primarily used for functions.