Dataframes and Lists / Data Types / Docker Builds / PSQL

Anudha Mittal
6 min readMar 30, 2024

The code goes back and forth between dataframes and lists. This must be ineffecient.

There are multiple lists. Elements at the same index in each list are a set.

If I remove an element from a list, the index of every subsequent element changes.

Current solution: Make every list a column in a dataframe. Remove a row in a dataframe. Then the removal is concurrent.

Why not just stick to dataframes? I do some math manipulation on all elements in a list. This might be faster than operating on every element in a column in a dataframe. I haven’t checked.

Scaling up with VMs, docker containers inside VMs, and VMs managed by an orchestration software.

After creating a dockerfile, use docker build /path/to/dockerfile

Getting rid of the comment in the EXPOSE line resolved the error, i.e. syntax error. The docker file was used to build an image. The image size 563 Mb.

Another image was used to launch a container interactively in the shell, use flat -it for ‘interactive terminal’ and specify that /bin/bash is the command to be executed. A python script successfully runs inside the docker container, demo’d by printing hello.

Since nginx was installed, the container has a default dir structure: /var/www/html and an index.html file at this location.

at /bin ls shows the following psql files

The user installed programs are not /bin , instead they are at /usr/lib

If modify config file, restart the service for changes to take effect:

service postgresql restart

Authentication of a user can be via md5, via peer, or via scram-sha-256.

CONFIG FILE FOR PSQL

Switch to a user called postgres. Now we are in a VM, and then in a docker container, and able to access a db.

su — postgres [there is a space b/w the su and the dash and postgres, medium reformats]

When the docker container is started using an image, the postgresql server is installed because it was in the dockerfile. Must start the service:

service postgresql start

Entering and exiting the psql command line has two relevant commands: psql and \q . See the change in the command line.

Created 3 tables: bikes, planets; customers.

When writing sql queries to join tables, the order of tables specified by ON does not affect the result.

When adding a new column, it is necessary to declare the name and the data type for the values in the column.

Join more than 2 tables, as many as needed, by using multiple joins. The query below has 2 joins and aggregates data from 3 tables: customers, bikes, planets based on aligning the column ‘name’ from each of those tables. Resulting table has some redundancy, such as the column ‘name’ is repeated 3 times.

The results of a query can be saved as a new table or inserted into another table. Three Rows from ‘bikes’ are inserted into ‘cheap_bikes’ based on a price condition. Maya, Jill, and Jen have low price bikes.

SQL query result → adds on rows to another table that was already in the DB

The Cheap_Bikes table already had some information, such as about Maya’s bike, and now there is a repetition of data.

Types supported by PSQL:

1. Numeric Types

  • Integer Types:
  • SMALLINT: 2-byte integer, range: -32,768 to +32,767
  • INTEGER or INT: 4-byte integer, range: -2,147,483,648 to +2,147,483,647
  • BIGINT: 8-byte integer, range: -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807
  • Serial Types (Auto-increment):
  • SMALLSERIAL: 2-byte auto-incrementing integer
  • SERIAL: 4-byte auto-incrementing integer
  • BIGSERIAL: 8-byte auto-incrementing integer
  • Decimal Types:
  • NUMERIC(precision, scale): User-specified precision, exact decimal values, useful for monetary amounts.
  • DECIMAL(precision, scale): Synonym for NUMERIC.
  • Floating-Point Types:
  • REAL: 4-byte floating-point number.
  • DOUBLE PRECISION: 8-byte floating-point number.

2. Monetary Types

  • MONEY: Represents currency values with fixed fractional precision, formatted according to locale.

3. Character Types

  • CHAR(n): Fixed-length character, padded with spaces if necessary.
  • VARCHAR(n): Variable-length character with a specified limit.
  • TEXT: Variable-length character with no limit, useful for large text blocks.

4. Binary Data Types

  • BYTEA: Stores binary data, useful for files, images, or other binary content.

5. Date and Time Types

  • DATE: Calendar date (year, month, day).
  • TIME [WITHOUT TIME ZONE]: Time of day without time zone.
  • TIME WITH TIME ZONE: Time of day with time zone.
  • TIMESTAMP [WITHOUT TIME ZONE]: Date and time without time zone.
  • TIMESTAMP WITH TIME ZONE (alias TIMESTAMPTZ): Date and time with time zone.
  • INTERVAL: Represents a span of time (e.g., days, hours, minutes).

6. Boolean Type

  • BOOLEAN: Stores TRUE, FALSE, or NULL.

7. Enumerated Types

  • ENUM: User-defined list of allowed values (e.g., ENUM('small', 'medium', 'large')).

8. Geometric Types

  • POINT: Stores a geometric point (x, y).
  • LINE: Stores a geometric line.
  • LSEG: Stores a line segment.
  • BOX: Stores a rectangular box.
  • PATH: Stores a path (open or closed).
  • POLYGON: Stores a closed polygon.
  • CIRCLE: Stores a circle.

9. Network Address Types

  • Network Types:
  • CIDR: IPv4 or IPv6 network address.
  • INET: IPv4 or IPv6 host address.
  • MACADDR: MAC address.

10. UUID Type

  • UUID: Stores universally unique identifiers (UUIDs), often used as primary keys.

11. JSON Types

  • JSON: Stores JSON data in plain text format.
  • JSONB: Stores JSON data in a binary format, allowing faster access and indexing.

12. Array Types

  • Supports arrays of any data type (e.g., INTEGER[], TEXT[]), useful for storing multiple values in a single column.

13. Range Types

  • INT4RANGE: Range of integer values.
  • NUMRANGE: Range of numeric values.
  • TSRANGE: Range of timestamps (without time zone).
  • TSTZRANGE: Range of timestamps (with time zone).
  • DATERANGE: Range of dates.

14. Composite Types

  • Allows defining custom data types that group multiple fields (similar to a struct in other languages).

15. Special Types

  • TSVECTOR: Used for full-text search.
  • TSQUERY: Used to define a text search query.
  • XML: Stores XML data.
  • VOID: Represents a void return type, primarily used for functions.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response