Data representation

data and data representation

Computers use binary - the digits 0 and 1 - to store data. A binary digit, or bit, is the smallest unit of data in computing. It is represented by a 0 or a 1. Binary numbers are made up of binary digits (bits), eg the binary number 1001. The circuits in a computer's processor are made up of billions of transistors. A transistor is a tiny switch that is activated by the electronic signals it receives. The digits 1 and 0 used in binary reflect the on and off states of a transistor. Computer programs are sets of instructions. Each instruction is translated into machine code - simple binary codes that activate the CPU. Programmers write computer code and this is converted by a translator into binary instructions that the processor can execute. All software, music, documents, and any other information that is processed by a computer, is also stored using binary. [1]

To include strings, integers, characters and colours. This should include considering the space taken by data, for instance the relation between the hexadecimal representation of colours and the number of colours available.

This video is superb place to understand this topic

  • 1 How a file is stored on a computer
  • 2 How an image is stored in a computer
  • 3 The way in which data is represented in the computer.
  • 6 Standards
  • 7 References

How a file is stored on a computer [ edit ]

How an image is stored in a computer [ edit ]

The way in which data is represented in the computer. [ edit ].

To include strings, integers, characters and colours. This should include considering the space taken by data, for instance the relation between the hexadecimal representation of colours and the number of colours available [3] .

This helpful material is used with gratitude from a computer science wiki under a Creative Commons Attribution 3.0 License [4]

Sound [ edit ]

  • Let's look at an oscilloscope
  • The BBC has an excellent article on how computers represent sound

See Also [ edit ]

Standards [ edit ].

  • Outline the way in which data is represented in the computer.

References [ edit ]

  • ↑ http://www.bbc.co.uk/education/guides/zwsbwmn/revision/1
  • ↑ https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
  • ↑ IBO Computer Science Guide, First exams 2014
  • ↑ https://compsci2014.wikispaces.com/2.1.10+Outline+the+way+in+which+data+is+represented+in+the+computer

A unit of abstract mathematical system subject to the laws of arithmetic.

A natural number, a negative of a natural number, or zero.

Give a brief account.

  • Computer organization
  • Very important ideas in computer science

COMPUTER SCIENCE CAFÉ

  • FOUNDATION YEARS
  • LEARN TO CODE
  • ROBOTICS ENGINEERING
  • CLASS PROJECTS
  • Classroom Discussions
  • Useful Links

Picture

Page Statistics

Table of contents.

  • Introduction to Functional Computer
  • Fundamentals of Architectural Design

Data Representation

  • Instruction Set Architecture : Instructions and Formats
  • Instruction Set Architecture : Design Models
  • Instruction Set Architecture : Addressing Modes
  • Performance Measurements and Issues
  • Computer Architecture Assessment 1
  • Fixed Point Arithmetic : Addition and Subtraction
  • Fixed Point Arithmetic : Multiplication
  • Fixed Point Arithmetic : Division
  • Floating Point Arithmetic
  • Arithmetic Logic Unit Design
  • CPU's Data Path
  • CPU's Control Unit
  • Control Unit Design
  • Concepts of Pipelining
  • Computer Architecture Assessment 2
  • Pipeline Hazards
  • Memory Characteristics and Organization
  • Cache Memory
  • Virtual Memory
  • I/O Communication and I/O Controller
  • Input/Output Data Transfer
  • Direct Memory Access controller and I/O Processor
  • CPU Interrupts and Interrupt Handling
  • Computer Architecture Assessment 3

Course Computer Architecture

Digital computers store and process information in binary form as digital logic has only two values "1" and "0" or in other words "True or False" or also said as "ON or OFF". This system is called radix 2. We human generally deal with radix 10 i.e. decimal. As a matter of convenience there are many other representations like Octal (Radix 8), Hexadecimal (Radix 16), Binary coded decimal (BCD), Decimal etc.

Every computer's CPU has a width measured in terms of bits such as 8 bit CPU, 16 bit CPU, 32 bit CPU etc. Similarly, each memory location can store a fixed number of bits and is called memory width. Given the size of the CPU and Memory, it is for the programmer to handle his data representation. Most of the readers may be knowing that 4 bits form a Nibble, 8 bits form a byte. The word length is defined by the Instruction Set Architecture of the CPU. The word length may be equal to the width of the CPU.

The memory simply stores information as a binary pattern of 1's and 0's. It is to be interpreted as what the content of a memory location means. If the CPU is in the Fetch cycle, it interprets the fetched memory content to be instruction and decodes based on Instruction format. In the Execute cycle, the information from memory is considered as data. As a common man using a computer, we think computers handle English or other alphabets, special characters or numbers. A programmer considers memory content to be data types of the programming language he uses. Now recall figure 1.2 and 1.3 of chapter 1 to reinforce your thought that conversion happens from computer user interface to internal representation and storage.

  • Data Representation in Computers

Information handled by a computer is classified as instruction and data. A broad overview of the internal representation of the information is illustrated in figure 3.1. No matter whether it is data in a numeric or non-numeric form or integer, everything is internally represented in Binary. It is up to the programmer to handle the interpretation of the binary pattern and this interpretation is called Data Representation . These data representation schemes are all standardized by international organizations.

Choice of Data representation to be used in a computer is decided by

  • The number types to be represented (integer, real, signed, unsigned, etc.)
  • Range of values likely to be represented (maximum and minimum to be represented)
  • The Precision of the numbers i.e. maximum accuracy of representation (floating point single precision, double precision etc)
  • If non-numeric i.e. character, character representation standard to be chosen. ASCII, EBCDIC, UTF are examples of character representation standards.
  • The hardware support in terms of word width, instruction.

Before we go into the details, let us take an example of interpretation. Say a byte in Memory has value "0011 0001". Although there exists a possibility of so many interpretations as in figure 3.2, the program has only one interpretation as decided by the programmer and declared in the program.

  • Fixed point Number Representation

Fixed point numbers are also known as whole numbers or Integers. The number of bits used in representing the integer also implies the maximum number that can be represented in the system hardware. However for the efficiency of storage and operations, one may choose to represent the integer with one Byte, two Bytes, Four bytes or more. This space allocation is translated from the definition used by the programmer while defining a variable as integer short or long and the Instruction Set Architecture.

In addition to the bit length definition for integers, we also have a choice to represent them as below:

  • Unsigned Integer : A positive number including zero can be represented in this format. All the allotted bits are utilised in defining the number. So if one is using 8 bits to represent the unsigned integer, the range of values that can be represented is 28 i.e. "0" to "255". If 16 bits are used for representing then the range is 216 i.e. "0 to 65535".
  • Signed Integer : In this format negative numbers, zero, and positive numbers can be represented. A sign bit indicates the magnitude direction as positive or negative. There are three possible representations for signed integer and these are Sign Magnitude format, 1's Compliment format and 2's Complement format .

Signed Integer – Sign Magnitude format: Most Significant Bit (MSB) is reserved for indicating the direction of the magnitude (value). A "0" on MSB means a positive number and a "1" on MSB means a negative number. If n bits are used for representation, n-1 bits indicate the absolute value of the number. Examples for n=8:

Examples for n=8:

0010 1111 = + 47 Decimal (Positive number)

1010 1111 = - 47 Decimal (Negative Number)

0111 1110 = +126 (Positive number)

1111 1110 = -126 (Negative Number)

0000 0000 = + 0 (Postive Number)

1000 0000 = - 0 (Negative Number)

Although this method is easy to understand, Sign Magnitude representation has several shortcomings like

  • Zero can be represented in two ways causing redundancy and confusion.
  • The total range for magnitude representation is limited to 2n-1, although n bits were accounted.
  • The separate sign bit makes the addition and subtraction more complicated. Also, comparing two numbers is not straightforward.

Signed Integer – 1’s Complement format: In this format too, MSB is reserved as the sign bit. But the difference is in representing the Magnitude part of the value for negative numbers (magnitude) is inversed and hence called 1’s Complement form. The positive numbers are represented as it is in binary. Let us see some examples to better our understanding.

1101 0000 = - 47 Decimal (Negative Number)

1000 0001 = -126 (Negative Number)

1111 1111 = - 0 (Negative Number)

  • Converting a given binary number to its 2's complement form

Step 1 . -x = x' + 1 where x' is the one's complement of x.

Step 2 Extend the data width of the number, fill up with sign extension i.e. MSB bit is used to fill the bits.

Example: -47 decimal over 8bit representation

As you can see zero is not getting represented with redundancy. There is only one way of representing zero. The other problem of the complexity of the arithmetic operation is also eliminated in 2’s complement representation. Subtraction is done as Addition.

More exercises on number conversion are left to the self-interest of readers.

  • Floating Point Number system

The maximum number at best represented as a whole number is 2 n . In the Scientific world, we do come across numbers like Mass of an Electron is 9.10939 x 10-31 Kg. Velocity of light is 2.99792458 x 108 m/s. Imagine to write the number in a piece of paper without exponent and converting into binary for computer representation. Sure you are tired!!. It makes no sense to write a number in non- readable form or non- processible form. Hence we write such large or small numbers using exponent and mantissa. This is said to be Floating Point representation or real number representation. he real number system could have infinite values between 0 and 1.

Representation in computer

Unlike the two's complement representation for integer numbers, Floating Point number uses Sign and Magnitude representation for both mantissa and exponent . In the number 9.10939 x 1031, in decimal form, +31 is Exponent, 9.10939 is known as Fraction . Mantissa, Significand and fraction are synonymously used terms. In the computer, the representation is binary and the binary point is not fixed. For example, a number, say, 23.345 can be written as 2.3345 x 101 or 0.23345 x 102 or 2334.5 x 10-2. The representation 2.3345 x 101 is said to be in normalised form.

Floating-point numbers usually use multiple words in memory as we need to allot a sign bit, few bits for exponent and many bits for mantissa. There are standards for such allocation which we will see sooner.

  • IEEE 754 Floating Point Representation

We have two standards known as Single Precision and Double Precision from IEEE. These standards enable portability among different computers. Figure 3.3 picturizes Single precision while figure 3.4 picturizes double precision. Single Precision uses 32bit format while double precision is 64 bits word length. As the name suggests double precision can represent fractions with larger accuracy. In both the cases, MSB is sign bit for the mantissa part, followed by Exponent and Mantissa. The exponent part has its sign bit.

It is to be noted that in Single Precision, we can represent an exponent in the range -127 to +127. It is possible as a result of arithmetic operations the resulting exponent may not fit in. This situation is called overflow in the case of positive exponent and underflow in the case of negative exponent. The Double Precision format has 11 bits for exponent meaning a number as large as -1023 to 1023 can be represented. The programmer has to make a choice between Single Precision and Double Precision declaration using his knowledge about the data being handled.

The Floating Point operations on the regular CPU is very very slow. Generally, a special purpose CPU known as Co-processor is used. This Co-processor works in tandem with the main CPU. The programmer should be using the float declaration only if his data is in real number form. Float declaration is not to be used generously.

  • Decimal Numbers Representation

Decimal numbers (radix 10) are represented and processed in the system with the support of additional hardware. We deal with numbers in decimal format in everyday life. Some machines implement decimal arithmetic too, like floating-point arithmetic hardware. In such a case, the CPU uses decimal numbers in BCD (binary coded decimal) form and does BCD arithmetic operation. BCD operates on radix 10. This hardware operates without conversion to pure binary. It uses a nibble to represent a number in packed BCD form. BCD operations require not only special hardware but also decimal instruction set.

  • Exceptions and Error Detection

All of us know that when we do arithmetic operations, we get answers which have more digits than the operands (Ex: 8 x 2= 16). This happens in computer arithmetic operations too. When the result size exceeds the allotted size of the variable or the register, it becomes an error and exception. The exception conditions associated with numbers and number operations are Overflow, Underflow, Truncation, Rounding and Multiple Precision . These are detected by the associated hardware in arithmetic Unit. These exceptions apply to both Fixed Point and Floating Point operations. Each of these exceptional conditions has a flag bit assigned in the Processor Status Word (PSW). We may discuss more in detail in the later chapters.

  • Character Representation

Another data type is non-numeric and is largely character sets. We use a human-understandable character set to communicate with computer i.e. for both input and output. Standard character sets like EBCDIC and ASCII are chosen to represent alphabets, numbers and special characters. Nowadays Unicode standard is also in use for non-English language like Chinese, Hindi, Spanish, etc. These codes are accessible and available on the internet. Interested readers may access and learn more.

1. Track your progress [Earn 200 points]

Mark as complete

2. Provide your ratings to this chapter [Earn 100 points]

data and data representation

  • Data representation

Bytes of memory

  • Abstract machine

Unsigned integer representation

Signed integer representation, pointer representation, array representation, compiler layout, array access performance, collection representation.

  • Consequences of size and alignment rules

Uninitialized objects

Pointer arithmetic, undefined behavior.

  • Computer arithmetic

Arena allocation

This course is about learning how computers work, from the perspective of systems software: what makes programs work fast or slow, and how properties of the machines we program impact the programs we write. We want to communicate ideas, tools, and an experimental approach.

The course divides into six units:

  • Assembly & machine programming
  • Storage & caching
  • Kernel programming
  • Process management
  • Concurrency

The first unit, data representation , is all about how different forms of data can be represented in terms the computer can understand.

Computer memory is kind of like a Lite Brite.

Lite Brite

A Lite Brite is big black backlit pegboard coupled with a supply of colored pegs, in a limited set of colors. You can plug in the pegs to make all kinds of designs. A computer’s memory is like a vast pegboard where each slot holds one of 256 different colors. The colors are numbered 0 through 255, so each slot holds one byte . (A byte is a number between 0 and 255, inclusive.)

A slot of computer memory is identified by its address . On a computer with M bytes of memory, and therefore M slots, you can think of the address as a number between 0 and M −1. My laptop has 16 gibibytes of memory, so M = 16×2 30 = 2 34 = 17,179,869,184 = 0x4'0000'0000 —a very large number!

The problem of data representation is the problem of representing all the concepts we might want to use in programming—integers, fractions, real numbers, sets, pictures, texts, buildings, animal species, relationships—using the limited medium of addresses and bytes.

Powers of ten and powers of two. Digital computers love the number two and all powers of two. The electronics of digital computers are based on the bit , the smallest unit of storage, which a base-two digit: either 0 or 1. More complicated objects are represented by collections of bits. This choice has many scale and error-correction advantages. It also refracts upwards to larger choices, and even into terminology. Memory chips, for example, have capacities based on large powers of two, such as 2 30 bytes. Since 2 10 = 1024 is pretty close to 1,000, 2 20 = 1,048,576 is pretty close to a million, and 2 30 = 1,073,741,824 is pretty close to a billion, it’s common to refer to 2 30 bytes of memory as “a giga byte,” even though that actually means 10 9 = 1,000,000,000 bytes. But for greater precision, there are terms that explicitly signal the use of powers of two. 2 30 is a gibibyte : the “-bi-” component means “binary.”
Virtual memory. Modern computers actually abstract their memory spaces using a technique called virtual memory . The lowest-level kind of address, called a physical address , really does take on values between 0 and M −1. However, even on a 16GiB machine like my laptop, the addresses we see in programs can take on values like 0x7ffe'ea2c'aa67 that are much larger than M −1 = 0x3'ffff'ffff . The addresses used in programs are called virtual addresses . They’re incredibly useful for protection: since different running programs have logically independent address spaces, it’s much less likely that a bug in one program will crash the whole machine. We’ll learn about virtual memory in much more depth in the kernel unit ; the distinction between virtual and physical addresses is not as critical for data representation.

Most programming languages prevent their users from directly accessing memory. But not C and C++! These languages let you access any byte of memory with a valid address. This is powerful; it is also very dangerous. But it lets us get a hands-on view of how computers really work.

C++ programs accomplish their work by constructing, examining, and modifying objects . An object is a region of data storage that contains a value, such as the integer 12. (The standard specifically says “a region of data storage in the execution environment, the contents of which can represent values”.) Memory is called “memory” because it remembers object values.

In this unit, we often use functions called hexdump to examine memory. These functions are defined in hexdump.cc . hexdump_object(x) prints out the bytes of memory that comprise an object named x , while hexdump(ptr, size) prints out the size bytes of memory starting at a pointer ptr .

For example, in datarep1/add.cc , we might use hexdump_object to examine the memory used to represent some integers:

This display reports that a , b , and c are each four bytes long; that a , b , and c are located at different, nonoverlapping addresses (the long hex number in the first column); and shows us how the numbers 1, 2, and 3 are represented in terms of bytes. (More on that later.)

The compiler, hardware, and standard together define how objects of different types map to bytes. Each object uses a contiguous range of addresses (and thus bytes), and objects never overlap (objects that are active simultaneously are always stored in distinct address ranges).

Since C and C++ are designed to help software interface with hardware devices, their standards are transparent about how objects are stored. A C++ program can ask how big an object is using the sizeof keyword. sizeof(T) returns the number of bytes in the representation of an object of type T , and sizeof(x) returns the size of object x . The result of sizeof is a value of type size_t , which is an unsigned integer type large enough to hold any representable size. On 64-bit architectures, such as x86-64 (our focus in this course), size_t can hold numbers between 0 and 2 64 –1.

Qualitatively different objects may have the same data representation. For example, the following three objects have the same data representation on x86-64, which you can verify using hexdump :

In C and C++, you can’t reliably tell the type of an object by looking at the contents of its memory. That’s why tricks like our different addf*.cc functions work.

An object can have many names. For example, here, local and *ptr refer to the same object:

The different names for an object are sometimes called aliases .

There are five objects here:

  • ch1 , a global variable
  • ch2 , a constant (non-modifiable) global variable
  • ch3 , a local variable
  • ch4 , a local variable
  • the anonymous storage allocated by new char and accessed by *ch4

Each object has a lifetime , which is called storage duration by the standard. There are three different kinds of lifetime.

  • static lifetime: The object lasts as long as the program runs. ( ch1 , ch2 )
  • automatic lifetime: The compiler allocates and destroys the object automatically as the program runs, based on the object’s scope (the region of the program in which it is meaningful). ( ch3 , ch4 )
  • dynamic lifetime: The programmer allocates and destroys the object explicitly. ( *allocated_ch )

Objects with dynamic lifetime aren’t easy to use correctly. Dynamic lifetime causes many serious problems in C programs, including memory leaks, use-after-free, double-free, and so forth. Those serious problems cause undefined behavior and play a “disastrously central role” in “our ongoing computer security nightmare” . But dynamic lifetime is critically important. Only with dynamic lifetime can you construct an object whose size isn’t known at compile time, or construct an object that outlives the function that created it.

The compiler and operating system work together to put objects at different addresses. A program’s address space (which is the range of addresses accessible to a program) divides into regions called segments . Objects with different lifetimes are placed into different segments. The most important segments are:

  • Code (also known as text or read-only data ). Contains instructions and constant global objects. Unmodifiable; static lifetime.
  • Data . Contains non-constant global objects. Modifiable; static lifetime.
  • Heap . Modifiable; dynamic lifetime.
  • Stack . Modifiable; automatic lifetime.

The compiler decides on a segment for each object based on its lifetime. The final compiler phase, which is called the linker , then groups all the program’s objects by segment (so, for instance, global variables from different compiler runs are grouped together into a single segment). Finally, when a program runs, the operating system loads the segments into memory. (The stack and heap segments grow on demand.)

We can use a program to investigate where objects with different lifetimes are stored. (See cs61-lectures/datarep2/mexplore0.cc .) This shows address ranges like this:

Constant global data and global data have the same lifetime, but are stored in different segments. The operating system uses different segments so it can prevent the program from modifying constants. It marks the code segment, which contains functions (instructions) and constant global data, as read-only, and any attempt to modify code-segment memory causes a crash (a “Segmentation violation”).

An executable is normally at least as big as the static-lifetime data (the code and data segments together). Since all that data must be in memory for the entire lifetime of the program, it’s written to disk and then loaded by the OS before the program starts running. There is an exception, however: the “bss” segment is used to hold modifiable static-lifetime data with initial value zero. Such data is common, since all static-lifetime data is initialized to zero unless otherwise specified in the program text. Rather than storing a bunch of zeros in the object files and executable, the compiler and linker simply track the location and size of all zero-initialized global data. The operating system sets this memory to zero during the program load process. Clearing memory is faster than loading data from disk, so this optimization saves both time (the program loads faster) and space (the executable is smaller).

Abstract machine and hardware

Programming involves turning an idea into hardware instructions. This transformation happens in multiple steps, some you control and some controlled by other programs.

First you have an idea , like “I want to make a flappy bird iPhone game.” The computer can’t (yet) understand that idea. So you transform the idea into a program , written in some programming language . This process is called programming.

A C++ program actually runs on an abstract machine . The behavior of this machine is defined by the C++ standard , a technical document. This document is supposed to be so precisely written as to have an exact mathematical meaning, defining exactly how every C++ program behaves. But the document can’t run programs!

C++ programs run on hardware (mostly), and the hardware determines what behavior we see. Mapping abstract machine behavior to instructions on real hardware is the task of the C++ compiler (and the standard library and operating system). A C++ compiler is correct if and only if it translates each correct program to instructions that simulate the expected behavior of the abstract machine.

This same rough series of transformations happens for any programming language, although some languages use interpreters rather than compilers.

A bit is the fundamental unit of digital information: it’s either 0 or 1.

C++ manages memory in units of bytes —8 contiguous bits that together can represent numbers between 0 and 255. C’s unit for a byte is char : the abstract machine says a byte is stored in char . That means an unsigned char holds values in the inclusive range [0, 255].

The C++ standard actually doesn’t require that a byte hold 8 bits, and on some crazy machines from decades ago , bytes could hold nine bits! (!?)

But larger numbers, such as 258, don’t fit in a single byte. To represent such numbers, we must use multiple bytes. The abstract machine doesn’t specify exactly how this is done—it’s the compiler and hardware’s job to implement a choice. But modern computers always use place–value notation , just like in decimal numbers. In decimal, the number 258 is written with three digits, the meanings of which are determined both by the digit and by their place in the overall number:

\[ 258 = 2\times10^2 + 5\times10^1 + 8\times10^0 \]

The computer uses base 256 instead of base 10. Two adjacent bytes can represent numbers between 0 and \(255\times256+255 = 65535 = 2^{16}-1\) , inclusive. A number larger than this would take three or more bytes.

\[ 258 = 1\times256^1 + 2\times256^0 \]

On x86-64, the ones place, the least significant byte, is on the left, at the lowest address in the contiguous two-byte range used to represent the integer. This is the opposite of how decimal numbers are written: decimal numbers put the most significant digit on the left. The representation choice of putting the least-significant byte in the lowest address is called little-endian representation. x86-64 uses little-endian representation.

Some computers actually store multi-byte integers the other way, with the most significant byte stored in the lowest address; that’s called big-endian representation. The Internet’s fundamental protocols, such as IP and TCP, also use big-endian order for multi-byte integers, so big-endian is also called “network” byte order.

The C++ standard defines five fundamental unsigned integer types, along with relationships among their sizes. Here they are, along with their actual sizes and ranges on x86-64:

Other architectures and operating systems implement different ranges for these types. For instance, on IA32 machines like Intel’s Pentium (the 32-bit processors that predated x86-64), sizeof(long) was 4, not 8.

Note that all values of a smaller unsigned integer type can fit in any larger unsigned integer type. When a value of a larger unsigned integer type is placed in a smaller unsigned integer object, however, not every value fits; for instance, the unsigned short value 258 doesn’t fit in an unsigned char x . When this occurs, the C++ abstract machine requires that the smaller object’s value equals the least -significant bits of the larger value (so x will equal 2).

In addition to these types, whose sizes can vary, C++ has integer types whose sizes are fixed. uint8_t , uint16_t , uint32_t , and uint64_t define 8-bit, 16-bit, 32-bit, and 64-bit unsigned integers, respectively; on x86-64, these correspond to unsigned char , unsigned short , unsigned int , and unsigned long .

This general procedure is used to represent a multi-byte integer in memory.

  • Write the large integer in hexadecimal format, including all leading zeros required by the type size. For example, the unsigned value 65534 would be written 0x0000FFFE . There will be twice as many hexadecimal digits as sizeof(TYPE) .
  • Divide the integer into its component bytes, which are its digits in base 256. In our example, they are, from most to least significant, 0x00, 0x00, 0xFF, and 0xFE.

In little-endian representation, the bytes are stored in memory from least to most significant. If our example was stored at address 0x30, we would have:

In big-endian representation, the bytes are stored in the reverse order.

Computers are often fastest at dealing with fixed-length numbers, rather than variable-length numbers, and processor internals are organized around a fixed word size . A word is the natural unit of data used by a processor design . In most modern processors, this natural unit is 8 bytes or 64 bits , because this is the power-of-two number of bytes big enough to hold those processors’ memory addresses. Many older processors could access less memory and had correspondingly smaller word sizes, such as 4 bytes (32 bits).

The best representation for signed integers—and the choice made by x86-64, and by the C++20 abstract machine—is two’s complement . Two’s complement representation is based on this principle: Addition and subtraction of signed integers shall use the same instructions as addition and subtraction of unsigned integers.

To see what this means, let’s think about what -x should mean when x is an unsigned integer. Wait, negative unsigned?! This isn’t an oxymoron because C++ uses modular arithmetic for unsigned integers: the result of an arithmetic operation on unsigned values is always taken modulo 2 B , where B is the number of bits in the unsigned value type. Thus, on x86-64,

-x is simply the number that, when added to x , yields 0 (mod 2 B ). For example, when unsigned x = 0xFFFFFFFFU , then -x == 1U , since x + -x equals zero (mod 2 32 ).

To obtain -x , we flip all the bits in x (an operation written ~x ) and then add 1. To see why, consider the bit representations. What is x + (~x + 1) ? Well, (~x) i (the i th bit of ~x ) is 1 whenever x i is 0, and vice versa. That means that every bit of x + ~x is 1 (there are no carries), and x + ~x is the largest unsigned integer, with value 2 B -1. If we add 1 to this, we get 2 B . Which is 0 (mod 2 B )! The highest “carry” bit is dropped, leaving zero.

Two’s complement arithmetic uses half of the unsigned integer representations for negative numbers. A two’s-complement signed integer with B bits has the following values:

  • If the most-significant bit is 1, the represented number is negative. Specifically, the represented number is – (~x + 1) , where the outer negative sign is mathematical negation (not computer arithmetic).
  • If every bit is 0, the represented number is 0.
  • If the most-significant but is 0 but some other bit is 1, the represented number is positive.

The most significant bit is also called the sign bit , because if it is 1, then the represented value depends on the signedness of the type (and that value is negative for signed types).

Another way to think about two’s-complement is that, for B -bit integers, the most-significant bit has place value 2 B –1 in unsigned arithmetic and negative 2 B –1 in signed arithmetic. All other bits have the same place values in both kinds of arithmetic.

The two’s-complement bit pattern for x + y is the same whether x and y are considered as signed or unsigned values. For example, in 4-bit arithmetic, 5 has representation 0b0101 , while the representation 0b1100 represents 12 if unsigned and –4 if signed ( ~0b1100 + 1 = 0b0011 + 1 == 4). Let’s add those bit patterns and see what we get:

Note that this is the right answer for both signed and unsigned arithmetic : 5 + 12 = 17 = 1 (mod 16), and 5 + -4 = 1.

Subtraction and multiplication also produce the same results for unsigned arithmetic and signed two’s-complement arithmetic. (For instance, 5 * 12 = 60 = 12 (mod 16), and 5 * -4 = -20 = -4 (mod 16).) This is not true of division. (Consider dividing the 4-bit representation 0b1110 by 2. In signed arithmetic, 0b1110 represents -2, so 0b1110/2 == 0b1111 (-1); but in unsigned arithmetic, 0b1110 is 14, so 0b1110/2 == 0b0111 (7).) And, of course, it is not true of comparison. In signed 4-bit arithmetic, 0b1110 < 0 , but in unsigned 4-bit arithmetic, 0b1110 > 0 . This means that a C compiler for a two’s-complement machine can use a single add instruction for either signed or unsigned numbers, but it must generate different instruction patterns for signed and unsigned division (or less-than, or greater-than).

There are a couple quirks with C signed arithmetic. First, in two’s complement, there are more negative numbers than positive numbers. A representation with sign bit is 1, but every other bit 0, has no positive counterpart at the same bit width: for this number, -x == x . (In 4-bit arithmetic, -0b1000 == ~0b1000 + 1 == 0b0111 + 1 == 0b1000 .) Second, and far worse, is that arithmetic overflow on signed integers is undefined behavior .

The C++ abstract machine requires that signed integers have the same sizes as their unsigned counterparts.

We distinguish pointers , which are concepts in the C abstract machine, from addresses , which are hardware concepts. A pointer combines an address and a type.

The memory representation of a pointer is the same as the representation of its address value. The size of that integer is the machine’s word size; for example, on x86-64, a pointer occupies 8 bytes, and a pointer to an object located at address 0x400abc would be stored as:

The C++ abstract machine defines an unsigned integer type uintptr_t that can hold any address. (You have to #include <inttypes.h> or <cinttypes> to get the definition.) On most machines, including x86-64, uintptr_t is the same as unsigned long . Cast a pointer to an integer address value with syntax like (uintptr_t) ptr ; cast back to a pointer with syntax like (T*) addr . Casts between pointer types and uintptr_t are information preserving, so this assertion will never fail:

Since it is a 64-bit architecture, the size of an x86-64 address is 64 bits (8 bytes). That’s also the size of x86-64 pointers.

To represent an array of integers, C++ and C allocate the integers next to each other in memory, in sequential addresses, with no gaps or overlaps. Here, we put the integers 0, 1, and 258 next to each other, starting at address 1008:

Say that you have an array of N integers, and you access each of those integers in order, accessing each integer exactly once. Does the order matter?

Computer memory is random-access memory (RAM), which means that a program can access any bytes of memory in any order—it’s not, for example, required to read memory in ascending order by address. But if we run experiments, we can see that even in RAM, different access orders have very different performance characteristics.

Our arraysum program sums up all the integers in an array of N integers, using an access order based on its arguments, and prints the resulting delay. Here’s the result of a couple experiments on accessing 10,000,000 items in three orders, “up” order (sequential: elements 0, 1, 2, 3, …), “down” order (reverse sequential: N , N −1, N −2, …), and “random” order (as it sounds).

Wow! Down order is just a bit slower than up, but random order seems about 40 times slower. Why?

Random order is defeating many of the internal architectural optimizations that make memory access fast on modern machines. Sequential order, since it’s more predictable, is much easier to optimize.

Foreshadowing. This part of the lecture is a teaser for the Storage unit, where we cover access patterns and caching, including the processor caches that explain this phenomenon, in much more depth.

The C++ programming language offers several collection mechanisms for grouping subobjects together into new kinds of object. The collections are arrays, structs, and unions. (Classes are a kind of struct. All library types, such as vectors, lists, and hash tables, use combinations of these collection types.) The abstract machine defines how subobjects are laid out inside a collection. This is important, because it lets C/C++ programs exchange messages with hardware and even with programs written in other languages: messages can be exchanged only when both parties agree on layout.

Array layout in C++ is particularly simple: The objects in an array are laid out sequentially in memory, with no gaps or overlaps. Assume a declaration like T x[N] , where x is an array of N objects of type T , and say that the address of x is a . Then the address of element x[i] equals a + i * sizeof(T) , and sizeof(a) == N * sizeof(T) .

Sidebar: Vector representation

The C++ library type std::vector defines an array that can grow and shrink. For instance, this function creates a vector containing the numbers 0 up to N in sequence:

Here, v is an object with automatic lifetime. This means its size (in the sizeof sense) is fixed at compile time. Remember that the sizes of static- and automatic-lifetime objects must be known at compile time; only dynamic-lifetime objects can have varying size based on runtime parameters. So where and how are v ’s contents stored?

The C++ abstract machine requires that v ’s elements are stored in an array in memory. (The v.data() method returns a pointer to the first element of the array.) But it does not define std::vector ’s layout otherwise, and C++ library designers can choose different layouts based on their needs. We found these to hold for the std::vector in our library:

sizeof(v) == 24 for any vector of any type, and the address of v is a stack address (i.e., v is located in the stack segment).

The first 8 bytes of the vector hold the address of the first element of the contents array—call it the begin address . This address is a heap address, which is as expected, since the contents must have dynamic lifetime. The value of the begin address is the same as that of v.data() .

Bytes 8–15 hold the address just past the contents array—call it the end address . Its value is the same as &v.data()[v.size()] . If the vector is empty, then the begin address and the end address are the same.

Bytes 16–23 hold an address greater than or equal to the end address. This is the capacity address . As a vector grows, it will sometimes outgrow its current location and move its contents to new memory addresses. To reduce the number of copies, vectors usually to request more memory from the operating system than they immediately need; this additional space, which is called “capacity,” supports cheap growth. Often the capacity doubles on each growth spurt, since this allows operations like v.push_back() to execute in O (1) time on average.

Compilers must also decide where different objects are stored when those objects are not part of a collection. For instance, consider this program:

The abstract machine says these objects cannot overlap, but does not otherwise constrain their positions in memory.

On Linux, GCC will put all these variables into the stack segment, which we can see using hexdump . But it can put them in the stack segment in any order , as we can see by reordering the declarations (try declaration order i1 , c1 , i2 , c2 , c3 ), by changing optimization levels, or by adding different scopes (braces). The abstract machine gives the programmer no guarantees about how object addresses relate. In fact, the compiler may move objects around during execution, as long as it ensures that the program behaves according to the abstract machine. Modern optimizing compilers often do this, particularly for automatic objects.

But what order does the compiler choose? With optimization disabled, the compiler appears to lay out objects in decreasing order by declaration, so the first declared variable in the function has the highest address. With optimization enabled, the compiler follows roughly the same guideline, but it also rearranges objects by type—for instance, it tends to group char s together—and it can reuse space if different variables in the same function have disjoint lifetimes. The optimizing compiler tends to use less space for the same set of variables. This is because it’s arranging objects by alignment.

The C++ compiler and library restricts the addresses at which some kinds of data appear. In particular, the address of every int value is always a multiple of 4, whether it’s located on the stack (automatic lifetime), the data segment (static lifetime), or the heap (dynamic lifetime).

A bunch of observations will show you these rules:

These are the alignment restrictions for an x86-64 Linux machine.

These restrictions hold for most x86-64 operating systems, except that on Windows, the long type has size and alignment 4. (The long long type has size and alignment 8 on all x86-64 operating systems.)

Just like every type has a size, every type has an alignment. The alignment of a type T is a number a ≥1 such that the address of every object of type T must be a multiple of a . Every object with type T has size sizeof(T) —it occupies sizeof(T) contiguous bytes of memory; and has alignment alignof(T) —the address of its first byte is a multiple of alignof(T) . You can also say sizeof(x) and alignof(x) where x is the name of an object or another expression.

Alignment restrictions can make hardware simpler, and therefore faster. For instance, consider cache blocks. CPUs access memory through a transparent hardware cache. Data moves from primary memory, or RAM (which is large—a couple gigabytes on most laptops—and uses cheaper, slower technology) to the cache in units of 64 or 128 bytes. Those units are always aligned: on a machine with 128-byte cache blocks, the bytes with memory addresses [127, 128, 129, 130] live in two different cache blocks (with addresses [0, 127] and [128, 255]). But the 4 bytes with addresses [4n, 4n+1, 4n+2, 4n+3] always live in the same cache block. (This is true for any small power of two: the 8 bytes with addresses [8n,…,8n+7] always live in the same cache block.) In general, it’s often possible to make a system faster by leveraging restrictions—and here, the CPU hardware can load data faster when it can assume that the data lives in exactly one cache line.

The compiler, library, and operating system all work together to enforce alignment restrictions.

On x86-64 Linux, alignof(T) == sizeof(T) for all fundamental types (the types built in to C: integer types, floating point types, and pointers). But this isn’t always true; on x86-32 Linux, double has size 8 but alignment 4.

It’s possible to construct user-defined types of arbitrary size, but the largest alignment required by a machine is fixed for that machine. C++ lets you find the maximum alignment for a machine with alignof(std::max_align_t) ; on x86-64, this is 16, the alignment of the type long double (and the alignment of some less-commonly-used SIMD “vector” types ).

We now turn to the abstract machine rules for laying out all collections. The sizes and alignments for user-defined types—arrays, structs, and unions—are derived from a couple simple rules or principles. Here they are. The first rule applies to all types.

1. First-member rule. The address of the first member of a collection equals the address of the collection.

Thus, the address of an array is the same as the address of its first element. The address of a struct is the same as the address of the first member of the struct.

The next three rules depend on the class of collection. Every C abstract machine enforces these rules.

2. Array rule. Arrays are laid out sequentially as described above.

3. Struct rule. The second and subsequent members of a struct are laid out in order, with no overlap, subject to alignment constraints.

4. Union rule. All members of a union share the address of the union.

In C, every struct follows the struct rule, but in C++, only simple structs follow the rule. Complicated structs, such as structs with some public and some private members, or structs with virtual functions, can be laid out however the compiler chooses. The typical situation is that C++ compilers for a machine architecture (e.g., “Linux x86-64”) will all agree on a layout procedure for complicated structs. This allows code compiled by different compilers to interoperate.

That next rule defines the operation of the malloc library function.

5. Malloc rule. Any non-null pointer returned by malloc has alignment appropriate for any type. In other words, assuming the allocated size is adequate, the pointer returned from malloc can safely be cast to T* for any T .

Oddly, this holds even for small allocations. The C++ standard (the abstract machine) requires that malloc(1) return a pointer whose alignment is appropriate for any type, including types that don’t fit.

And the final rule is not required by the abstract machine, but it’s how sizes and alignments on our machines work.

6. Minimum rule. The sizes and alignments of user-defined types, and the offsets of struct members, are minimized within the constraints of the other rules.

The minimum rule, and the sizes and alignments of basic types, are defined by the x86-64 Linux “ABI” —its Application Binary Interface. This specification standardizes how x86-64 Linux C compilers should behave, and lets users mix and match compilers without problems.

Consequences of the size and alignment rules

From these rules we can derive some interesting consequences.

First, the size of every type is a multiple of its alignment .

To see why, consider an array with two elements. By the array rule, these elements have addresses a and a+sizeof(T) , where a is the address of the array. Both of these addresses contain a T , so they are both a multiple of alignof(T) . That means sizeof(T) is also a multiple of alignof(T) .

We can also characterize the sizes and alignments of different collections .

  • The size of an array of N elements of type T is N * sizeof(T) : the sum of the sizes of its elements. The alignment of the array is alignof(T) .
  • The size of a union is the maximum of the sizes of its components (because the union can only hold one component at a time). Its alignment is also the maximum of the alignments of its components.
  • The size of a struct is at least as big as the sum of the sizes of its components. Its alignment is the maximum of the alignments of its components.

Thus, the alignment of every collection equals the maximum of the alignments of its components.

It’s also true that the alignment equals the least common multiple of the alignments of its components. You might have thought lcm was a better answer, but the max is the same as the lcm for every architecture that matters, because all fundamental alignments are powers of two.

The size of a struct might be larger than the sum of the sizes of its components, because of alignment constraints. Since the compiler must lay out struct components in order, and it must obey the components’ alignment constraints, and it must ensure different components occupy disjoint addresses, it must sometimes introduce extra space in structs. Here’s an example: the struct will have 3 bytes of padding after char c , to ensure that int i2 has the correct alignment.

Thanks to padding, reordering struct components can sometimes reduce the total size of a struct. Padding can happen at the end of a struct as well as the middle. Padding can never happen at the start of a struct, however (because of Rule 1).

The rules also imply that the offset of any struct member —which is the difference between the address of the member and the address of the containing struct— is a multiple of the member’s alignment .

To see why, consider a struct s with member m at offset o . The malloc rule says that any pointer returned from malloc is correctly aligned for s . Every pointer returned from malloc is maximally aligned, equalling 16*x for some integer x . The struct rule says that the address of m , which is 16*x + o , is correctly aligned. That means that 16*x + o = alignof(m)*y for some integer y . Divide both sides by a = alignof(m) and you see that 16*x/a + o/a = y . But 16/a is an integer—the maximum alignment is a multiple of every alignment—so 16*x/a is an integer. We can conclude that o/a must also be an integer!

Finally, we can also derive the necessity for padding at the end of structs. (How?)

What happens when an object is uninitialized? The answer depends on its lifetime.

  • static lifetime (e.g., int global; at file scope): The object is initialized to 0.
  • automatic or dynamic lifetime (e.g., int local; in a function, or int* ptr = new int ): The object is uninitialized and reading the object’s value before it is assigned causes undefined behavior.

Compiler hijinks

In C++, most dynamic memory allocation uses special language operators, new and delete , rather than library functions.

Though this seems more complex than the library-function style, it has advantages. A C compiler cannot tell what malloc and free do (especially when they are redefined to debugging versions, as in the problem set), so a C compiler cannot necessarily optimize calls to malloc and free away. But the C++ compiler may assume that all uses of new and delete follow the rules laid down by the abstract machine. That means that if the compiler can prove that an allocation is unnecessary or unused, it is free to remove that allocation!

For example, we compiled this program in the problem set environment (based on test003.cc ):

The optimizing C++ compiler removes all calls to new and delete , leaving only the call to m61_printstatistics() ! (For instance, try objdump -d testXXX to look at the compiled x86-64 instructions.) This is valid because the compiler is explicitly allowed to eliminate unused allocations, and here, since the ptrs variable is local and doesn’t escape main , all allocations are unused. The C compiler cannot perform this useful transformation. (But the C compiler can do other cool things, such as unroll the loops .)

One of C’s more interesting choices is that it explicitly relates pointers and arrays. Although arrays are laid out in memory in a specific way, they generally behave like pointers when they are used. This property probably arose from C’s desire to explicitly model memory as an array of bytes, and it has beautiful and confounding effects.

We’ve already seen one of these effects. The hexdump function has this signature (arguments and return type):

But we can just pass an array as argument to hexdump :

When used in an expression like this—here, as an argument—the array magically changes into a pointer to its first element. The above call has the same meaning as this:

C programmers transition between arrays and pointers very naturally.

A confounding effect is that unlike all other types, in C arrays are passed to and returned from functions by reference rather than by value. C is a call-by-value language except for arrays. This means that all function arguments and return values are copied, so that parameter modifications inside a function do not affect the objects passed by the caller—except for arrays. For instance: void f ( int a[ 2 ]) { a[ 0 ] = 1 ; } int main () { int x[ 2 ] = { 100 , 101 }; f(x); printf( "%d \n " , x[ 0 ]); // prints 1! } If you don’t like this behavior, you can get around it by using a struct or a C++ std::array . #include <array> struct array1 { int a[ 2 ]; }; void f1 (array1 arg) { arg.a[ 0 ] = 1 ; } void f2 (std :: array < int , 2 > a) { a[ 0 ] = 1 ; } int main () { array1 x = {{ 100 , 101 }}; f1(x); printf( "%d \n " , x.a[ 0 ]); // prints 100 std :: array < int , 2 > x2 = { 100 , 101 }; f2(x2); printf( "%d \n " , x2[ 0 ]); // prints 100 }

C++ extends the logic of this array–pointer correspondence to support arithmetic on pointers as well.

Pointer arithmetic rule. In the C abstract machine, arithmetic on pointers produces the same result as arithmetic on the corresponding array indexes.

Specifically, consider an array T a[n] and pointers T* p1 = &a[i] and T* p2 = &a[j] . Then:

Equality : p1 == p2 if and only if (iff) p1 and p2 point to the same address, which happens iff i == j .

Inequality : Similarly, p1 != p2 iff i != j .

Less-than : p1 < p2 iff i < j .

Also, p1 <= p2 iff i <= j ; and p1 > p2 iff i > j ; and p1 >= p2 iff i >= j .

Pointer difference : What should p1 - p2 mean? Using array indexes as the basis, p1 - p2 == i - j . (But the type of the difference is always ptrdiff_t , which on x86-64 is long , the signed version of size_t .)

Addition : p1 + k (where k is an integer type) equals the pointer &a[i + k] . ( k + p1 returns the same thing.)

Subtraction : p1 - k equals &a[i - k] .

Increment and decrement : ++p1 means p1 = p1 + 1 , which means p1 = &a[i + 1] . Similarly, --p1 means p1 = &a[i - 1] . (There are also postfix versions, p1++ and p1-- , but C++ style prefers the prefix versions.)

No other arithmetic operations on pointers are allowed. You can’t multiply pointers, for example. (You can multiply addresses by casting the pointers to the address type, uintptr_t —so (uintptr_t) p1 * (uintptr_t) p2 —but why would you?)

From pointers to iterators

Let’s write a function that can sum all the integers in an array.

This function can compute the sum of the elements of any int array. But because of the pointer–array relationship, its a argument is really a pointer . That allows us to call it with subarrays as well as with whole arrays. For instance:

This way of thinking about arrays naturally leads to a style that avoids sizes entirely, using instead a sentinel or boundary argument that defines the end of the interesting part of the array.

These expressions compute the same sums as the above:

Note that the data from first to last forms a half-open range . iIn mathematical notation, we care about elements in the range [first, last) : the element pointed to by first is included (if it exists), but the element pointed to by last is not. Half-open ranges give us a simple and clear way to describe empty ranges, such as zero-element arrays: if first == last , then the range is empty.

Note that given a ten-element array a , the pointer a + 10 can be formed and compared, but must not be dereferenced—the element a[10] does not exist. The C/C++ abstract machines allow users to form pointers to the “one-past-the-end” boundary elements of arrays, but users must not dereference such pointers.

So in C, two pointers naturally express a range of an array. The C++ standard template library, or STL, brilliantly abstracts this pointer notion to allow two iterators , which are pointer-like objects, to express a range of any standard data structure—an array, a vector, a hash table, a balanced tree, whatever. This version of sum works for any container of int s; notice how little it changed:

Some example uses:

Addresses vs. pointers

What’s the difference between these expressions? (Again, a is an array of type T , and p1 == &a[i] and p2 == &a[j] .)

The first expression is defined analogously to index arithmetic, so d1 == i - j . But the second expression performs the arithmetic on the addresses corresponding to those pointers. We will expect d2 to equal sizeof(T) * d1 . Always be aware of which kind of arithmetic you’re using. Generally arithmetic on pointers should not involve sizeof , since the sizeof is included automatically according to the abstract machine; but arithmetic on addresses almost always should involve sizeof .

Although C++ is a low-level language, the abstract machine is surprisingly strict about which pointers may be formed and how they can be used. Violate the rules and you’re in hell because you have invoked the dreaded undefined behavior .

Given an array a[N] of N elements of type T :

Forming a pointer &a[i] (or a + i ) with 0 ≤ i ≤ N is safe.

Forming a pointer &a[i] with i < 0 or i > N causes undefined behavior.

Dereferencing a pointer &a[i] with 0 ≤ i < N is safe.

Dereferencing a pointer &a[i] with i < 0 or i ≥ N causes undefined behavior.

(For the purposes of these rules, objects that are not arrays count as single-element arrays. So given T x , we can safely form &x and &x + 1 and dereference &x .)

What “undefined behavior” means is horrible. A program that executes undefined behavior is erroneous. But the compiler need not catch the error. In fact, the abstract machine says anything goes : undefined behavior is “behavior … for which this International Standard imposes no requirements.” “Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).” Other possible behaviors include allowing hackers from the moon to steal all of a program’s data, take it over, and force it to delete the hard drive on which it is running. Once undefined behavior executes, a program may do anything, including making demons fly out of the programmer’s nose.

Pointer arithmetic, and even pointer comparisons, are also affected by undefined behavior. It’s undefined to go beyond and array’s bounds using pointer arithmetic. And pointers may be compared for equality or inequality even if they point to different arrays or objects, but if you try to compare different arrays via less-than, like this:

that causes undefined behavior.

If you really want to compare pointers that might be to different arrays—for instance, you’re writing a hash function for arbitrary pointers—cast them to uintptr_t first.

Undefined behavior and optimization

A program that causes undefined behavior is not a C++ program . The abstract machine says that a C++ program, by definition, is a program whose behavior is always defined. The C++ compiler is allowed to assume that its input is a C++ program. (Obviously!) So the compiler can assume that its input program will never cause undefined behavior. Thus, since undefined behavior is “impossible,” if the compiler can prove that a condition would cause undefined behavior later, it can assume that condition will never occur.

Consider this program:

If we supply a value equal to (char*) -1 , we’re likely to see output like this:

with no assertion failure! But that’s an apparently impossible result. The printout can only happen if x + 1 > x (otherwise, the assertion will fail and stop the printout). But x + 1 , which equals 0 , is less than x , which is the largest 8-byte value!

The impossible happens because of undefined behavior reasoning. When the compiler sees an expression like x + 1 > x (with x a pointer), it can reason this way:

“Ah, x + 1 . This must be a pointer into the same array as x (or it might be a boundary pointer just past that array, or just past the non-array object x ). This must be so because forming any other pointer would cause undefined behavior.

“The pointer comparison is the same as an index comparison. x + 1 > x means the same thing as &x[1] > &x[0] . But that holds iff 1 > 0 .

“In my infinite wisdom, I know that 1 > 0 . Thus x + 1 > x always holds, and the assertion will never fail.

“My job is to make this code run fast. The fastest code is code that’s not there. This assertion will never fail—might as well remove it!”

Integer undefined behavior

Arithmetic on signed integers also has important undefined behaviors. Signed integer arithmetic must never overflow. That is, the compiler may assume that the mathematical result of any signed arithmetic operation, such as x + y (with x and y both int ), can be represented inside the relevant type. It causes undefined behavior, therefore, to add 1 to the maximum positive integer. (The ubexplore.cc program demonstrates how this can produce impossible results, as with pointers.)

Arithmetic on unsigned integers is much safer with respect to undefined behavior. Unsigned integers are defined to perform arithmetic modulo their size. This means that if you add 1 to the maximum positive unsigned integer, the result will always be zero.

Dividing an integer by zero causes undefined behavior whether or not the integer is signed.

Sanitizers, which in our makefiles are turned on by supplying SAN=1 , can catch many undefined behaviors as soon as they happen. Sanitizers are built in to the compiler itself; a sanitizer involves cooperation between the compiler and the language runtime. This has the major performance advantage that the compiler introduces exactly the required checks, and the optimizer can then use its normal analyses to remove redundant checks.

That said, undefined behavior checking can still be slow. Undefined behavior allows compilers to make assumptions about input values, and those assumptions can directly translate to faster code. Turning on undefined behavior checking can make some benchmark programs run 30% slower [link] .

Signed integer undefined behavior

File cs61-lectures/datarep5/ubexplore2.cc contains the following program.

What will be printed if we run the program with ./ubexplore2 0x7ffffffe 0x7fffffff ?

0x7fffffff is the largest positive value can be represented by type int . Adding one to this value yields 0x80000000 . In two's complement representation this is the smallest negative number represented by type int .

Assuming that the program behaves this way, then the loop exit condition i > n2 can never be met, and the program should run (and print out numbers) forever.

However, if we run the optimized version of the program, it prints only two numbers and exits:

The unoptimized program does print forever and never exits.

What’s going on here? We need to look at the compiled assembly of the program with and without optimization (via objdump -S ).

The unoptimized version basically looks like this:

This is a pretty direct translation of the loop.

The optimized version, though, does it differently. As always, the optimizer has its own ideas. (Your compiler may produce different results!)

The compiler changed the source’s less than or equal to comparison, i <= n2 , into a not equal to comparison in the executable, i != n2 + 1 (in both cases using signed computer arithmetic, i.e., modulo 2 32 )! The comparison i <= n2 will always return true when n2 == 0x7FFFFFFF , the maximum signed integer, so the loop goes on forever. But the i != n2 + 1 comparison does not always return true when n2 == 0x7FFFFFFF : when i wraps around to 0x80000000 (the smallest negative integer), then i equals n2 + 1 (which also wrapped), and the loop stops.

Why did the compiler make this transformation? In the original loop, the step-6 jump is immediately followed by another comparison and jump in steps 1 and 2. The processor jumps all over the place, which can confuse its prediction circuitry and slow down performance. In the transformed loop, the step-7 jump is never followed by a comparison and jump; instead, step 7 goes back to step 4, which always prints the current number. This more streamlined control flow is easier for the processor to make fast.

But the streamlined control flow is only a valid substitution under the assumption that the addition n2 + 1 never overflows . Luckily (sort of), signed arithmetic overflow causes undefined behavior, so the compiler is totally justified in making that assumption!

Programs based on ubexplore2 have demonstrated undefined behavior differences for years, even as the precise reasons why have changed. In some earlier compilers, we found that the optimizer just upgraded the int s to long s—arithmetic on long s is just as fast on x86-64 as arithmetic on int s, since x86-64 is a 64-bit architecture, and sometimes using long s for everything lets the compiler avoid conversions back and forth. The ubexplore2l program demonstrates this form of transformation: since the loop variable is added to a long counter, the compiler opportunistically upgrades i to long as well. This transformation is also only valid under the assumption that i + 1 will not overflow—which it can’t, because of undefined behavior.

Using unsigned type prevents all this undefined behavior, because arithmetic overflow on unsigned integers is well defined in C/C++. The ubexplore2u.cc file uses an unsigned loop index and comparison, and ./ubexplore2u and ./ubexplore2u.noopt behave exactly the same (though you have to give arguments like ./ubexplore2u 0xfffffffe 0xffffffff to see the overflow).

Computer arithmetic and bitwise operations

Basic bitwise operators.

Computers offer not only the usual arithmetic operators like + and - , but also a set of bitwise operators. The basic ones are & (and), | (or), ^ (xor/exclusive or), and the unary operator ~ (complement). In truth table form:

In C or C++, these operators work on integers. But they work bitwise: the result of an operation is determined by applying the operation independently at each bit position. Here’s how to compute 12 & 4 in 4-bit unsigned arithmetic:

These basic bitwise operators simplify certain important arithmetics. For example, (x & (x - 1)) == 0 tests whether x is zero or a power of 2.

Negation of signed integers can also be expressed using a bitwise operator: -x == ~x + 1 . This is in fact how we define two's complement representation. We can verify that x and (-x) does add up to zero under this representation:

Bitwise "and" ( & ) can help with modular arithmetic. For example, x % 32 == (x & 31) . We essentially "mask off", or clear, higher order bits to do modulo-powers-of-2 arithmetics. This works in any base. For example, in decimal, the fastest way to compute x % 100 is to take just the two least significant digits of x .

Bitwise shift of unsigned integer

x << i appends i zero bits starting at the least significant bit of x . High order bits that don't fit in the integer are thrown out. For example, assuming 4-bit unsigned integers

Similarly, x >> i appends i zero bits at the most significant end of x . Lower bits are thrown out.

Bitwise shift helps with division and multiplication. For example:

A modern compiler can optimize y = x * 66 into y = (x << 6) + (x << 1) .

Bitwise operations also allows us to treat bits within an integer separately. This can be useful for "options".

For example, when we call a function to open a file, we have a lot of options:

  • Open for reading?
  • Open for writing?
  • Read from the end?
  • Optimize for writing?

We have a lot of true/false options.

One bad way to implement this is to have this function take a bunch of arguments -- one argument for each option. This makes the function call look like this:

The long list of arguments slows down the function call, and one can also easily lose track of the meaning of the individual true/false values passed in.

A cheaper way to achieve this is to use a single integer to represent all the options. Have each option defined as a power of 2, and simply | (or) them together and pass them as a single integer.

Flags are usually defined as powers of 2 so we set one bit at a time for each flag. It is less common but still possible to define a combination flag that is not a power of 2, so that it sets multiple bits in one go.

File cs61-lectures/datarep5/mb-driver.cc contains a memory allocation benchmark. The core of the benchmark looks like this:

The benchmark tests the performance of memnode_arena::allocate() and memnode_arena::deallocate() functions. In the handout code, these functions do the same thing as new memnode and delete memnode —they are wrappers for malloc and free . The benchmark allocates 4096 memnode objects, then free-and-then-allocates them for noperations times, and then frees all of them.

We only allocate memnode s, and all memnode s are of the same size, so we don't need metadata that keeps track of the size of each allocation. Furthermore, since all dynamically allocated data are freed at the end of the function, for each individual memnode_free() call we don't really need to return memory to the system allocator. We can simply reuse these memory during the function and returns all memory to the system at once when the function exits.

If we run the benchmark with 100000000 allocation, and use the system malloc() , free() functions to implement the memnode allocator, the benchmark finishes in 0.908 seconds.

Our alternative implementation of the allocator can finish in 0.355 seconds, beating the heavily optimized system allocator by a factor of 3. We will reveal how we achieved this in the next lecture.

We continue our exploration with the memnode allocation benchmark introduced from the last lecture.

File cs61-lectures/datarep6/mb-malloc.cc contains a version of the benchmark using the system new and delete operators.

In this function we allocate an array of 4096 pointers to memnode s, which occupy 2 3 *2 12 =2 15 bytes on the stack. We then allocate 4096 memnode s. Our memnode is defined like this:

Each memnode contains a std::string object and an unsigned integer. Each std::string object internally contains a pointer points to an character array in the heap. Therefore, every time we create a new memnode , we need 2 allocations: one to allocate the memnode itself, and another one performed internally by the std::string object when we initialize/assign a string value to it.

Every time we deallocate a memnode by calling delete , we also delete the std::string object, and the string object knows that it should deallocate the heap character array it internally maintains. So there are also 2 deallocations occuring each time we free a memnode.

We make the benchmark to return a seemingly meaningless result to prevent an aggressive compiler from optimizing everything away. We also use this result to make sure our subsequent optimizations to the allocator are correct by generating the same result.

This version of the benchmark, using the system allocator, finishes in 0.335 seconds. Not bad at all.

Spoiler alert: We can do 15x better than this.

1st optimization: std::string

We only deal with one file name, namely "datarep/mb-filename.cc", which is constant throughout the program for all memnode s. It's also a string literal, which means it as a constant string has a static life time. Why can't we just simply use a const char* in place of the std::string and let the pointer point to the static constant string? This saves us the internal allocation/deallocation performed by std::string every time we initialize/delete a string.

The fix is easy, we simply change the memnode definition:

This version of the benchmark now finishes in 0.143 seconds, a 2x improvement over the original benchmark. This 2x improvement is consistent with a 2x reduction in numbers of allocation/deallocation mentioned earlier.

You may ask why people still use std::string if it involves an additional allocation and is slower than const char* , as shown in this benchmark. std::string is much more flexible in that it also deals data that doesn't have static life time, such as input from a user or data the program receives over the network. In short, when the program deals with strings that are not constant, heap data is likely to be very useful, and std::string provides facilities to conveniently handle on-heap data.

2nd optimization: the system allocator

We still use the system allocator to allocate/deallocate memnode s. The system allocator is a general-purpose allocator, which means it must handle allocation requests of all sizes. Such general-purpose designs usually comes with a compromise for performance. Since we are only memnode s, which are fairly small objects (and all have the same size), we can build a special- purpose allocator just for them.

In cs61-lectures/datarep5/mb2.cc , we actually implement a special-purpose allocator for memnode s:

This allocator maintains a free list (a C++ vector ) of freed memnode s. allocate() simply pops a memnode off the free list if there is any, and deallocate() simply puts the memnode on the free list. This free list serves as a buffer between the system allocator and the benchmark function, so that the system allocator is invoked less frequently. In fact, in the benchmark, the system allocator is only invoked for 4096 times when it initializes the pointer array. That's a huge reduction because all 10-million "recycle" operations in the middle now doesn't involve the system allocator.

With this special-purpose allocator we can finish the benchmark in 0.057 seconds, another 2.5x improvement.

However this allocator now leaks memory: it never actually calls delete ! Let's fix this by letting it also keep track of all allocated memnode s. The modified definition of memnode_arena now looks like this:

With the updated allocator we simply need to invoke arena.destroy_all() at the end of the function to fix the memory leak. And we don't even need to invoke this method manually! We can use the C++ destructor for the memnode_arena struct, defined as ~memnode_arena() in the code above, which is automatically called when our arena object goes out of scope. We simply make the destructor invoke the destroy_all() method, and we are all set.

Fixing the leak doesn't appear to affect performance at all. This is because the overhead added by tracking the allocated list and calling delete only affects our initial allocation the 4096 memnode* pointers in the array plus at the very end when we clean up. These 8192 additional operations is a relative small number compared to the 10 million recycle operations, so the added overhead is hardly noticeable.

Spoiler alert: We can improve this by another factor of 2.

3rd optimization: std::vector

In our special purpose allocator memnode_arena , we maintain an allocated list and a free list both using C++ std::vector s. std::vector s are dynamic arrays, and like std::string they involve an additional level of indirection and stores the actual array in the heap. We don't access the allocated list during the "recycling" part of the benchmark (which takes bulk of the benchmark time, as we showed earlier), so the allocated list is probably not our bottleneck. We however, add and remove elements from the free list for each recycle operation, and the indirection introduced by the std::vector here may actually be our bottleneck. Let's find out.

Instead of using a std::vector , we could use a linked list of all free memnode s for the actual free list. We will need to include some extra metadata in the memnode to store pointers for this linked list. However, unlike in the debugging allocator pset, in a free list we don't need to store this metadata in addition to actual memnode data: the memnode is free, and not in use, so we can use reuse its memory, using a union:

We then maintain the free list like this:

Compared to the std::vector free list, this free list we always directly points to an available memnode when it is not empty ( free_list !=nullptr ), without going through any indirection. In the std::vector free list one would first have to go into the heap to access the actual array containing pointers to free memnode s, and then access the memnode itself.

With this change we can now finish the benchmark under 0.3 seconds! Another 2x improvement over the previous one!

Compared to the benchmark with the system allocator (which finished in 0.335 seconds), we managed to achieve a speedup of nearly 15x with arena allocation.

Getuplearn

Data Representation in Computer: Number Systems, Characters, Audio, Image and Video

  • Post author: Anuj Kumar
  • Post published: 16 July 2021
  • Post category: Computer Science
  • Post comments: 0 Comments

Table of Contents

  • 1 What is Data Representation in Computer?
  • 2.1 Binary Number System
  • 2.2 Octal Number System
  • 2.3 Decimal Number System
  • 2.4 Hexadecimal Number System
  • 3.4 Unicode
  • 4 Data Representation of Audio, Image and Video
  • 5.1 What is number system with example?

What is Data Representation in Computer?

A computer uses a fixed number of bits to represent a piece of data which could be a number, a character, image, sound, video, etc. Data representation is the method used internally to represent data in a computer. Let us see how various types of data can be represented in computer memory.

Before discussing data representation of numbers, let us see what a number system is.

Number Systems

Number systems are the technique to represent numbers in the computer system architecture, every value that you are saving or getting into/from computer memory has a defined number system.

A number is a mathematical object used to count, label, and measure. A number system is a systematic way to represent numbers. The number system we use in our day-to-day life is the decimal number system that uses 10 symbols or digits.

The number 289 is pronounced as two hundred and eighty-nine and it consists of the symbols 2, 8, and 9. Similarly, there are other number systems. Each has its own symbols and method for constructing a number.

A number system has a unique base, which depends upon the number of symbols. The number of symbols used in a number system is called the base or radix of a number system.

Let us discuss some of the number systems. Computer architecture supports the following number of systems:

Binary Number System

Octal number system, decimal number system, hexadecimal number system.

Number Systems

A Binary number system has only two digits that are 0 and 1. Every number (value) represents 0 and 1 in this number system. The base of the binary number system is 2 because it has only two digits.

The octal number system has only eight (8) digits from 0 to 7. Every number (value) represents with 0,1,2,3,4,5,6 and 7 in this number system. The base of the octal number system is 8, because it has only 8 digits.

The decimal number system has only ten (10) digits from 0 to 9. Every number (value) represents with 0,1,2,3,4,5,6, 7,8 and 9 in this number system. The base of decimal number system is 10, because it has only 10 digits.

A Hexadecimal number system has sixteen (16) alphanumeric values from 0 to 9 and A to F. Every number (value) represents with 0,1,2,3,4,5,6, 7,8,9,A,B,C,D,E and F in this number system. The base of the hexadecimal number system is 16, because it has 16 alphanumeric values.

Here A is 10, B is 11, C is 12, D is 13, E is 14 and F is 15 .

Data Representation of Characters

There are different methods to represent characters . Some of them are discussed below:

Data Representation of Characters

The code called ASCII (pronounced ‘􀀏’.S-key”), which stands for American Standard Code for Information Interchange, uses 7 bits to represent each character in computer memory. The ASCII representation has been adopted as a standard by the U.S. government and is widely accepted.

A unique integer number is assigned to each character. This number called ASCII code of that character is converted into binary for storing in memory. For example, the ASCII code of A is 65, its binary equivalent in 7-bit is 1000001.

Since there are exactly 128 unique combinations of 7 bits, this 7-bit code can represent only128 characters. Another version is ASCII-8, also called extended ASCII, which uses 8 bits for each character, can represent 256 different characters.

For example, the letter A is represented by 01000001, B by 01000010 and so on. ASCII code is enough to represent all of the standard keyboard characters.

It stands for Extended Binary Coded Decimal Interchange Code. This is similar to ASCII and is an 8-bit code used in computers manufactured by International Business Machines (IBM). It is capable of encoding 256 characters.

If ASCII-coded data is to be used in a computer that uses EBCDIC representation, it is necessary to transform ASCII code to EBCDIC code. Similarly, if EBCDIC coded data is to be used in an ASCII computer, EBCDIC code has to be transformed to ASCII.

ISCII stands for Indian Standard Code for Information Interchange or Indian Script Code for Information Interchange. It is an encoding scheme for representing various writing systems of India. ISCII uses 8-bits for data representation.

It was evolved by a standardization committee under the Department of Electronics during 1986-88 and adopted by the Bureau of Indian Standards (BIS). Nowadays ISCII has been replaced by Unicode.

Using 8-bit ASCII we can represent only 256 characters. This cannot represent all characters of written languages of the world and other symbols. Unicode is developed to resolve this problem. It aims to provide a standard character encoding scheme, which is universal and efficient.

It provides a unique number for every character, no matter what the language and platform be. Unicode originally used 16 bits which can represent up to 65,536 characters. It is maintained by a non-profit organization called the Unicode Consortium.

The Consortium first published version 1.0.0 in 1991 and continues to develop standards based on that original work. Nowadays Unicode uses more than 16 bits and hence it can represent more characters. Unicode can represent characters in almost all written languages of the world.

Data Representation of Audio, Image and Video

In most cases, we may have to represent and process data other than numbers and characters. This may include audio data, images, and videos. We can see that like numbers and characters, the audio, image, and video data also carry information.

We will see different file formats for storing sound, image, and video .

Multimedia data such as audio, image, and video are stored in different types of files. The variety of file formats is due to the fact that there are quite a few approaches to compressing the data and a number of different ways of packaging the data.

For example, an image is most popularly stored in Joint Picture Experts Group (JPEG ) file format. An image file consists of two parts – header information and image data. Information such as the name of the file, size, modified data, file format, etc. is stored in the header part.

The intensity value of all pixels is stored in the data part of the file. The data can be stored uncompressed or compressed to reduce the file size. Normally, the image data is stored in compressed form. Let us understand what compression is.

Take a simple example of a pure black image of size 400X400 pixels. We can repeat the information black, black, …, black in all 16,0000 (400X400) pixels. This is the uncompressed form, while in the compressed form black is stored only once and information to repeat it 1,60,000 times is also stored.

Numerous such techniques are used to achieve compression. Depending on the application, images are stored in various file formats such as bitmap file format (BMP), Tagged Image File Format (TIFF), Graphics Interchange Format (GIF), Portable (Public) Network Graphic (PNG).

What we said about the header file information and compression is also applicable for audio and video files. Digital audio data can be stored in different file formats like WAV, MP3, MIDI, AIFF, etc. An audio file describes a format, sometimes referred to as the ‘container format’, for storing digital audio data.

For example, WAV file format typically contains uncompressed sound and MP3 files typically contain compressed audio data. The synthesized music data is stored in MIDI(Musical Instrument Digital Interface) files.

Similarly, video is also stored in different files such as AVI (Audio Video Interleave) – a file format designed to store both audio and video data in a standard package that allows synchronous audio with video playback, MP3, JPEG-2, WMV, etc.

FAQs About Data Representation in Computer

What is number system with example.

Let us discuss some of the number systems. Computer architecture supports the following number of systems: 1. Binary Number System 2. Octal Number System 3. Decimal Number System 4. Hexadecimal Number System

Related posts:

10 Types of Computers | History of Computers, Advantages

  • What is Microprocessor? Evolution of Microprocessor, Types, Features

What is operating system? Functions, Types, Types of User Interface

What is cloud computing classification, characteristics, principles, types of cloud providers.

  • What is Debugging? Types of Errors

What are Functions of Operating System? 6 Functions

What is flowchart in programming symbols, advantages, preparation.

  • Advantages and Disadvantages of Flowcharts

What is C++ Programming Language? C++ Character Set, C++ Tokens

  • What are C++ Keywords? Set of 59 keywords in C ++

What are Data Types in C++? Types

What are operators in c different types of operators in c, what are expressions in c types, what are decision making statements in c types, types of storage devices, advantages, examples, you might also like.

Problem Solving Algorithm

What is Problem Solving Algorithm?, Steps, Representation

Types of Computer Software

Types of Computer Software: Systems Software, Application Software

What is big data

What is Big Data? Characteristics, Tools, Types, Internet of Things (IOT)

what is meaning of cloud computing

Types of Computer Memory, Characteristics, Primary Memory, Secondary Memory

What is Computer System

What is Computer System? Definition, Characteristics, Functional Units, Components

What is artificial intelligence

What is Artificial Intelligence? Functions, 6 Benefits, Applications of AI

Data and Information

Data and Information: Definition, Characteristics, Types, Channels, Approaches

Flowchart in Programming

Advantages and Disadvantages of Operating System

Process Operating System

Generations of Computer First To Fifth, Classification, Characteristics, Features, Examples

functions of operating system

  • Entrepreneurship
  • Organizational Behavior
  • Financial Management
  • Communication
  • Human Resource Management
  • Sales Management
  • Marketing Management

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

K12 LibreTexts

2.1: Types of Data Representation

  • Last updated
  • Save as PDF
  • Page ID 5696

Two common types of graphic displays are bar charts and histograms. Both bar charts and histograms use vertical or horizontal bars to represent the number of data points in each category or interval. The main difference graphically is that in a  bar chart  there are spaces between the bars and in a  histogram  there are not spaces between the bars. Why does this subtle difference exist and what does it imply about graphic displays in general?

Displaying Data

It is often easier for people to interpret relative sizes of data when that data is displayed graphically. Note that a  categorical variable  is a variable that can take on one of a limited number of values and a  quantitative variable  is a variable that takes on numerical values that represent a measurable quantity. Examples of categorical variables are tv stations, the state someone lives in, and eye color while examples of quantitative variables are the height of students or the population of a city. There are a few common ways of displaying data graphically that you should be familiar with. 

A  pie chart  shows the relative proportions of data in different categories.  Pie charts  are excellent ways of displaying categorical data with easily separable groups. The following pie chart shows six categories labeled A−F.  The size of each pie slice is determined by the central angle. Since there are 360 o  in a circle, the size of the central angle θ A  of category A can be found by:

Screen Shot 2020-04-27 at 4.52.45 PM.png

CK-12 Foundation -  https://www.flickr.com/photos/slgc/16173880801  - CCSA

A  bar chart  displays frequencies of categories of data. The bar chart below has 5 categories, and shows the TV channel preferences for 53 adults. The horizontal axis could have also been labeled News, Sports, Local News, Comedy, Action Movies. The reason why the bars are separated by spaces is to emphasize the fact that they are categories and not continuous numbers. For example, just because you split your time between channel 8 and channel 44 does not mean on average you watch channel 26. Categories can be numbers so you need to be very careful.

Screen Shot 2020-04-27 at 4.54.15 PM.png

CK-12 Foundation -  https://www.flickr.com/photos/slgc/16173880801  - CCSA

A  histogram  displays frequencies of quantitative data that has been sorted into intervals. The following is a histogram that shows the heights of a class of 53 students. Notice the largest category is 56-60 inches with 18 people.

Screen Shot 2020-04-27 at 4.55.38 PM.png

A  boxplot  (also known as a  box and whiskers plot ) is another way to display quantitative data. It displays the five 5 number summary (minimum, Q1,  median , Q3, maximum). The box can either be vertically or horizontally displayed depending on the labeling of the axis. The box does not need to be perfectly symmetrical because it represents data that might not be perfectly symmetrical.

Screen Shot 2020-04-27 at 5.03.32 PM.png

Earlier, you were asked about the difference between histograms and bar charts. The reason for the space in bar charts but no space in histograms is bar charts graph categorical variables while histograms graph quantitative variables. It would be extremely improper to forget the space with bar charts because you would run the risk of implying a spectrum from one side of the chart to the other. Note that in the bar chart where TV stations where shown, the station numbers were not listed horizontally in order by size. This was to emphasize the fact that the stations were categories.

Create a boxplot of the following numbers in your calculator.

8.5, 10.9, 9.1, 7.5, 7.2, 6, 2.3, 5.5

Enter the data into L1 by going into the Stat menu.

Screen Shot 2020-04-27 at 5.04.34 PM.png

CK-12 Foundation - CCSA

Then turn the statplot on and choose boxplot.

Screen Shot 2020-04-27 at 5.05.07 PM.png

Use Zoomstat to automatically center the window on the boxplot.

Screen Shot 2020-04-27 at 5.05.34 PM.png

Create a pie chart to represent the preferences of 43 hungry students.

  • Other – 5
  • Burritos – 7
  • Burgers – 9
  • Pizza – 22

Screen Shot 2020-04-27 at 5.06.00 PM.png

Create a bar chart representing the preference for sports of a group of 23 people.

  • Football – 12
  • Baseball – 10
  • Basketball – 8
  • Hockey – 3

Screen Shot 2020-04-27 at 5.06.29 PM.png

Create a histogram for the income distribution of 200 million people.

  • Below $50,000 is 100 million people
  • Between $50,000 and $100,000 is 50 million people
  • Between $100,000 and $150,000 is 40 million people
  • Above $150,000 is 10 million people

Screen Shot 2020-04-27 at 5.07.15 PM.png

1. What types of graphs show categorical data?

2. What types of graphs show quantitative data?

A math class of 30 students had the following grades:

3. Create a bar chart for this data.

4. Create a pie chart for this data.

5. Which graph do you think makes a better visual representation of the data?

A set of 20 exam scores is 67, 94, 88, 76, 85, 93, 55, 87, 80, 81, 80, 61, 90, 84, 75, 93, 75, 68, 100, 98

6. Create a histogram for this data. Use your best judgment to decide what the intervals should be.

7. Find the  five number summary  for this data.

8. Use the  five number summary  to create a boxplot for this data.

9. Describe the data shown in the boxplot below.

Screen Shot 2020-04-27 at 5.11.42 PM.png

10. Describe the data shown in the histogram below.

Screen Shot 2020-04-27 at 5.12.15 PM.png

A math class of 30 students has the following eye colors:

11. Create a bar chart for this data.

12. Create a pie chart for this data.

13. Which graph do you think makes a better visual representation of the data?

14. Suppose you have data that shows the breakdown of registered republicans by state. What types of graphs could you use to display this data?

15. From which types of graphs could you obtain information about the spread of the data? Note that spread is a measure of how spread out all of the data is.

Review (Answers)

To see the Review answers, open this  PDF file  and look for section 15.4. 

Additional Resources

PLIX: Play, Learn, Interact, eXplore - Baby Due Date Histogram

Practice: Types of Data Representation

Real World: Prepare for Impact

Talk to our experts

1800-120-456-456

  • Introduction to Data Representation
  • Computer Science

ffImage

About Data Representation

Data can be anything, including a number, a name, musical notes, or the colour of an image. The way that we stored, processed, and transmitted data is referred to as data representation. We can use any device, including computers, smartphones, and iPads, to store data in digital format. The stored data is handled by electronic circuitry. A bit is a 0 or 1 used in digital data representation.

Data Representation Techniques

Data Representation Techniques

Classification of Computers

Computer scans are classified broadly based on their speed and computing power.

1. Microcomputers or PCs (Personal Computers): It is a single-user computer system with a medium-power microprocessor. It is referred to as a computer with a microprocessor as its central processing unit.

Microcomputer

Microcomputer

2. Mini-Computer: It is a multi-user computer system that can support hundreds of users at the same time.

Types of Mini Computers

Types of Mini Computers

3. Mainframe Computer: It is a multi-user computer system that can support hundreds of users at the same time. Software technology is distinct from minicomputer technology.

Mainframe Computer

Mainframe Computer

4. Super-Computer: With the ability to process hundreds of millions of instructions per second, it is a very quick computer. They  are used for specialised applications requiring enormous amounts of mathematical computations, but they are very expensive.

Supercomputer

Supercomputer

Types of Computer Number System

Every value saved to or obtained from computer memory uses a specific number system, which is the method used to represent numbers in the computer system architecture. One needs to be familiar with number systems in order to read computer language or interact with the system. 

Types of Number System

Types of Number System

1. Binary Number System 

There are only two digits in a binary number system: 0 and 1. In this number system, 0 and 1 stand in for every number (value). Because the binary number system only has two digits, its base is 2.

A bit is another name for each binary digit. The binary number system is also a positional value system, where each digit's value is expressed in powers of 2.

Characteristics of Binary Number System

The following are the primary characteristics of the binary system:

It only has two digits, zero and one.

Depending on its position, each digit has a different value.

Each position has the same value as a base power of two.

Because computers work with internal voltage drops, it is used in all types of computers.

Binary Number System

Binary Number System

2. Decimal Number System

The decimal number system is a base ten number system with ten digits ranging from 0 to 9. This means that these ten digits can represent any numerical quantity. A positional value system is also a decimal number system. This means that the value of digits will be determined by their position. 

Characteristics of Decimal Number System

Ten units of a given order equal one unit of the higher order, making it a decimal system.

The number 10 serves as the foundation for the decimal number system.

The value of each digit or number will depend on where it is located within the numeric figure because it is a positional system.

The value of this number results from multiplying all the digits by each power.

Decimal Number System

Decimal Number System

Decimal Binary Conversion Table

3. octal number system.

There are only eight (8) digits in the octal number system, from 0 to 7. In this number system, each number (value) is represented by the digits 0, 1, 2, 3,4,5,6, and 7. Since the octal number system only has 8 digits, its base is 8.

Characteristics of Octal Number System:

Contains eight digits: 0,1,2,3,4,5,6,7.

Also known as the base 8 number system.

Each octal number position represents a 0 power of the base (8). 

An octal number's last position corresponds to an x power of the base (8).

Octal Number System

Octal Number System

4. Hexadecimal Number System

There are sixteen (16) alphanumeric values in the hexadecimal number system, ranging from 0 to 9 and A to F. In this number system, each number (value) is represented by 0, 1, 2, 3, 5, 6, 7, 8, 9, A, B, C, D, E, and F. Because the hexadecimal number system has 16 alphanumeric values, its base is 16. Here, the numbers are A = 10, B = 11, C = 12, D = 13, E = 14, and F = 15.

Characteristics of Hexadecimal Number System:

A system of positional numbers.

Has 16 symbols or digits overall (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F). Its base is, therefore, 16.

Decimal values 10, 11, 12, 13, 14, and 15 are represented by the letters A, B, C, D, E, and F, respectively.

A single digit may have a maximum value of 15. 

Each digit position corresponds to a different base power (16).

Since there are only 16 digits, any hexadecimal number can be represented in binary with 4 bits.

Hexadecimal Number System

Hexadecimal Number System

So, we've seen how to convert decimals and use the Number System to communicate with a computer. The full character set of the English language, which includes all alphabets, punctuation marks, mathematical operators, special symbols, etc., must be supported by the computer in addition to numerical data. 

Learning By Doing

Choose the correct answer:.

1. Which computer is the largest in terms of size?

Minicomputer

Micro Computer

2. The binary number 11011001 is converted to what decimal value?

Solved Questions

1. Give some examples where Supercomputers are used.

Ans: Weather Prediction, Scientific simulations, graphics, fluid dynamic calculations, Nuclear energy research, electronic engineering and analysis of geological data.

2. Which of these is the most costly?

Mainframe computer

Ans: C) Supercomputer

arrow-right

FAQs on Introduction to Data Representation

1. What is the distinction between the Hexadecimal and Octal Number System?

The octal number system is a base-8 number system in which the digits 0 through 7 are used to represent numbers. The hexadecimal number system is a base-16 number system that employs the digits 0 through 9 as well as the letters A through F to represent numbers.

2. What is the smallest data representation?

The smallest data storage unit in a computer's memory is called a BYTE, which comprises 8 BITS.

3. What is the largest data unit?

The largest commonly available data storage unit is a terabyte or TB. A terabyte equals 1,000 gigabytes, while a tebibyte equals 1,024 gibibytes.

  • School Guide
  • Mathematics
  • Number System and Arithmetic
  • Trigonometry
  • Probability
  • Mensuration
  • Maths Formulas
  • Class 8 Maths Notes
  • Class 9 Maths Notes
  • Class 10 Maths Notes
  • Class 11 Maths Notes
  • Class 12 Maths Notes
  • What are the rational numbers between 3 and 5?
  • In how many ways a committee of 3 can be made from a total of 10 members?
  • Which kind of angle is between the smallest and the largest?
  • How many bit strings of length 9 have exactly 4 0's?
  • What are non negative real numbers?
  • Is 0.5 a whole number?
  • What is 2i equal to?
  • What are the six trigonometry functions?
  • What is the magnitude of the complex number 3 - 2i?
  • What are the uses of arithmetic mean?
  • How to find the ratio in which a point divides a line?
  • Evaluate sin 35° sin 55° - cos 35° cos 55°
  • If tan (A + B) = √3 and tan (A – B) = 1/√3, 0° B, then find A and B
  • How to find the vertex angle?
  • What is the most likely score from throwing two dice?
  • If two numbers a and b are even, then prove that their sum a + b is even
  • State whether Every whole number is a natural number or not
  • How to convert a complex number to exponential form?
  • What happens when you subtract two negatives?

What are the different ways of Data Representation?

The process of collecting the data and analyzing that data in large quantity is known as statistics. It is a branch of mathematics trading with the collection, analysis, interpretation, and presentation of numeral facts and figures.

It is a numerical statement that helps us to collect and analyze the data in large quantity the statistics are based on two of its concepts:

  • Statistical Data 
  • Statistical Science

Statistics must be expressed numerically and should be collected systematically.

Data Representation

The word data refers to constituting people, things, events, ideas. It can be a title, an integer, or anycast.  After collecting data the investigator has to condense them in tabular form to study their salient features. Such an arrangement is known as the presentation of data.

It refers to the process of condensing the collected data in a tabular form or graphically. This arrangement of data is known as Data Representation.

The row can be placed in different orders like it can be presented in ascending orders, descending order, or can be presented in alphabetical order. 

Example: Let the marks obtained by 10 students of class V in a class test, out of 50 according to their roll numbers, be: 39, 44, 49, 40, 22, 10, 45, 38, 15, 50 The data in the given form is known as raw data. The above given data can be placed in the serial order as shown below: Roll No. Marks 1 39 2 44 3 49 4 40 5 22 6 10 7 45 8 38 9 14 10 50 Now, if you want to analyse the standard of achievement of the students. If you arrange them in ascending or descending order, it will give you a better picture. Ascending order: 10, 15, 22, 38, 39, 40, 44. 45, 49, 50 Descending order: 50, 49, 45, 44, 40, 39, 38, 22, 15, 10 When the row is placed in ascending or descending order is known as arrayed data.

Types of Graphical Data Representation

Bar chart helps us to represent the collected data visually. The collected data can be visualized horizontally or vertically in a bar chart like amounts and frequency. It can be grouped or single. It helps us in comparing different items. By looking at all the bars, it is easy to say which types in a group of data influence the other.

Now let us understand bar chart by taking this example  Let the marks obtained by 5 students of class V in a class test, out of 10 according to their names, be: 7,8,4,9,6 The data in the given form is known as raw data. The above given data can be placed in the bar chart as shown below: Name Marks Akshay 7 Maya 8 Dhanvi 4 Jaslen 9 Muskan 6

A histogram is the graphical representation of data. It is similar to the appearance of a bar graph but there is a lot of difference between histogram and bar graph because a bar graph helps to measure the frequency of categorical data. A categorical data means it is based on two or more categories like gender, months, etc. Whereas histogram is used for quantitative data.

For example:

The graph which uses lines and points to present the change in time is known as a line graph. Line graphs can be based on the number of animals left on earth, the increasing population of the world day by day, or the increasing or decreasing the number of bitcoins day by day, etc. The line graphs tell us about the changes occurring across the world over time. In a  line graph, we can tell about two or more types of changes occurring around the world.

For Example:

Pie chart is a type of graph that involves a structural graphic representation of numerical proportion. It can be replaced in most cases by other plots like a bar chart, box plot, dot plot, etc. As per the research, it is shown that it is difficult to compare the different sections of a given pie chart, or if it is to compare data across different pie charts.

Frequency Distribution Table

A frequency distribution table is a chart that helps us to summarise the value and the frequency of the chart. This frequency distribution table has two columns, The first column consist of the list of the various outcome in the data, While the second column list the frequency of each outcome of the data. By putting this kind of data into a table it helps us to make it easier to understand and analyze the data. 

For Example: To create a frequency distribution table, we would first need to list all the outcomes in the data. In this example, the results are 0 runs, 1 run, 2 runs, and 3 runs. We would list these numerals in numerical ranking in the foremost queue. Subsequently, we ought to calculate how many times per result happened. They scored 0 runs in the 1st, 4th, 7th, and 8th innings, 1 run in the 2nd, 5th, and the 9th innings, 2 runs in the 6th inning, and 3 runs in the 3rd inning. We set the frequency of each result in the double queue. You can notice that the table is a vastly more useful method to show this data.  Baseball Team Runs Per Inning Number of Runs Frequency           0       4           1        3            2        1            3        1

Sample Questions

Question 1: Considering the school fee submission of 10 students of class 10th is given below:

In order to draw the bar graph for the data above, we prepare the frequency table as given below. Fee submission No. of Students Paid   6 Not paid    4 Now we have to represent the data by using the bar graph. It can be drawn by following the steps given below: Step 1: firstly we have to draw the two axis of the graph X-axis and the Y-axis. The varieties of the data must be put on the X-axis (the horizontal line) and the frequencies of the data must be put on the Y-axis (the vertical line) of the graph. Step 2: After drawing both the axis now we have to give the numeric scale to the Y-axis (the vertical line) of the graph It should be started from zero and ends up with the highest value of the data. Step 3: After the decision of the range at the Y-axis now we have to give it a suitable difference of the numeric scale. Like it can be 0,1,2,3…….or 0,10,20,30 either we can give it a numeric scale like 0,20,40,60… Step 4: Now on the X-axis we have to label it appropriately. Step 5: Now we have to draw the bars according to the data but we have to keep in mind that all the bars should be of the same length and there should be the same distance between each graph

Question 2: Watch the subsequent pie chart that denotes the money spent by Megha at the funfair. The suggested colour indicates the quantity paid for each variety. The total value of the data is 15 and the amount paid on each variety is diagnosed as follows:

Chocolates – 3

Wafers – 3

Toys – 2

Rides – 7

To convert this into pie chart percentage, we apply the formula:  (Frequency/Total Frequency) × 100 Let us convert the above data into a percentage: Amount paid on rides: (7/15) × 100 = 47% Amount paid on toys: (2/15) × 100 = 13% Amount paid on wafers: (3/15) × 100 = 20% Amount paid on chocolates: (3/15) × 100 = 20 %

Question 3: The line graph given below shows how Devdas’s height changes as he grows.

Given below is a line graph showing the height changes in Devdas’s as he grows. Observe the graph and answer the questions below.

data and data representation

(i) What was the height of  Devdas’s at 8 years? Answer: 65 inches (ii) What was the height of  Devdas’s at 6 years? Answer:  50 inches (iii) What was the height of  Devdas’s at 2 years? Answer: 35 inches (iv) How much has  Devdas’s grown from 2 to 8 years? Answer: 30 inches (v) When was  Devdas’s 35 inches tall? Answer: 2 years.

Please Login to comment...

  • School Learning
  • How to Delete Whatsapp Business Account?
  • Discord vs Zoom: Select The Efficienct One for Virtual Meetings?
  • Otter AI vs Dragon Speech Recognition: Which is the best AI Transcription Tool?
  • Google Messages To Let You Send Multiple Photos
  • 30 OOPs Interview Questions and Answers (2024)

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

  • Reviews / Why join our community?
  • For companies
  • Frequently asked questions

Data Representation

Literature on data representation.

Here’s the entire UX literature on Data Representation by the Interaction Design Foundation, collated in one place:

Learn more about Data Representation

Take a deep dive into Data Representation with our course AI for Designers .

In an era where technology is rapidly reshaping the way we interact with the world, understanding the intricacies of AI is not just a skill, but a necessity for designers . The AI for Designers course delves into the heart of this game-changing field, empowering you to navigate the complexities of designing in the age of AI. Why is this knowledge vital? AI is not just a tool; it's a paradigm shift, revolutionizing the design landscape. As a designer, make sure that you not only keep pace with the ever-evolving tech landscape but also lead the way in creating user experiences that are intuitive, intelligent, and ethical.

AI for Designers is taught by Ioana Teleanu, a seasoned AI Product Designer and Design Educator who has established a community of over 250,000 UX enthusiasts through her social channel UX Goodies. She imparts her extensive expertise to this course from her experience at renowned companies like UiPath and ING Bank, and now works on pioneering AI projects at Miro.

In this course, you’ll explore how to work with AI in harmony and incorporate it into your design process to elevate your career to new heights. Welcome to a course that doesn’t just teach design; it shapes the future of design innovation.

In lesson 1, you’ll explore AI's significance, understand key terms like Machine Learning, Deep Learning, and Generative AI, discover AI's impact on design, and master the art of creating effective text prompts for design.

In lesson 2, you’ll learn how to enhance your design workflow using AI tools for UX research, including market analysis, persona interviews, and data processing. You’ll dive into problem-solving with AI, mastering problem definition and production ideation.

In lesson 3, you’ll discover how to incorporate AI tools for prototyping, wireframing, visual design, and UX writing into your design process. You’ll learn how AI can assist to evaluate your designs and automate tasks, and ensure your product is launch-ready.

In lesson 4, you’ll explore the designer's role in AI-driven solutions, how to address challenges, analyze concerns, and deliver ethical solutions for real-world design applications.

Throughout the course, you'll receive practical tips for real-life projects. In the Build Your Portfolio exercises, you’ll practise how to  integrate AI tools into your workflow and design for AI products, enabling you to create a compelling portfolio case study to attract potential employers or collaborators.

All open-source articles on Data Representation

Visual mapping – the elements of information visualization.

data and data representation

  • 3 years ago

Rating Scales in UX Research: The Ultimate Guide

data and data representation

Open Access—Link to us!

We believe in Open Access and the  democratization of knowledge . Unfortunately, world-class educational materials such as this page are normally hidden behind paywalls or in expensive textbooks.

If you want this to change , cite this page , link to us, or join us to help us democratize design knowledge !

Privacy Settings

Our digital services use necessary tracking technologies, including third-party cookies, for security, functionality, and to uphold user rights. Optional cookies offer enhanced features, and analytics.

Experience the full potential of our site that remembers your preferences and supports secure sign-in.

Governs the storage of data necessary for maintaining website security, user authentication, and fraud prevention mechanisms.

Enhanced Functionality

Saves your settings and preferences, like your location, for a more personalized experience.

Referral Program

We use cookies to enable our referral program, giving you and your friends discounts.

Error Reporting

We share user ID with Bugsnag and NewRelic to help us track errors and fix issues.

Optimize your experience by allowing us to monitor site usage. You’ll enjoy a smoother, more personalized journey without compromising your privacy.

Analytics Storage

Collects anonymous data on how you navigate and interact, helping us make informed improvements.

Differentiates real visitors from automated bots, ensuring accurate usage data and improving your website experience.

Lets us tailor your digital ads to match your interests, making them more relevant and useful to you.

Advertising Storage

Stores information for better-targeted advertising, enhancing your online ad experience.

Personalization Storage

Permits storing data to personalize content and ads across Google services based on user behavior, enhancing overall user experience.

Advertising Personalization

Allows for content and ad personalization across Google services based on user behavior. This consent enhances user experiences.

Enables personalizing ads based on user data and interactions, allowing for more relevant advertising experiences across Google services.

Receive more relevant advertisements by sharing your interests and behavior with our trusted advertising partners.

Enables better ad targeting and measurement on Meta platforms, making ads you see more relevant.

Allows for improved ad effectiveness and measurement through Meta’s Conversions API, ensuring privacy-compliant data sharing.

LinkedIn Insights

Tracks conversions, retargeting, and web analytics for LinkedIn ad campaigns, enhancing ad relevance and performance.

LinkedIn CAPI

Enhances LinkedIn advertising through server-side event tracking, offering more accurate measurement and personalization.

Google Ads Tag

Tracks ad performance and user engagement, helping deliver ads that are most useful to you.

Share the knowledge!

Share this content on:

or copy link

Cite according to academic standards

Simply copy and paste the text below into your bibliographic reference list, onto your blog, or anywhere else. You can also just hyperlink to this page.

New to UX Design? We’re Giving You a Free ebook!

The Basics of User Experience Design

Download our free ebook The Basics of User Experience Design to learn about core concepts of UX design.

In 9 chapters, we’ll cover: conducting user interviews, design thinking, interaction design, mobile UX design, usability, UX research, and many more!

CodeAvail

Mastering the Art of Data Representation Statistics 

Data Representation Statistics

In today’s world, data is king. From businesses to healthcare to government, everyone relies on data to make informed decisions. But raw data can be overwhelming and difficult to make sense of. This is where data representation statistics come in. In this blog post, we will explore the importance of data representation statistics and how they can help you make sense of your data.

What are Data Representation Statistics?

Table of Contents

Data representation statistics is the process of converting raw data into a format that is easy to understand and interpret. This involves using various statistical methods to analyze and summarize the data. Data representation statistics can help you identify patterns, trends, and relationships in your data, which can help you make informed decisions.

Why are Data Representation Statistics Important?

Data representation statistics are important for several reasons:

Helps you make informed decisions

By converting raw data into a format that is easy to understand and interpret, it can help you make informed decisions.

Identifies patterns and trends 

It can help you identify patterns and trends in your data that may not be obvious when looking at raw data.

Communicate your findings 

It can help you communicate your findings to others in a clear and concise manner.

Provides insights 

It can provide insights into your data that you may not have considered.

Enables data-driven decision making 

By providing insights and identifying patterns and trends, data representation statistics can enable data-driven decision-making.

Methods of Data Representation Statistics

Tables – Tables are a simple and effective way to present data in rows and columns, allowing for easy comparison and summarization.

Bar charts – Bar charts are used to compare the frequency or distribution of data points in different categories, with each category represented by a separate bar.

Line charts – Line charts are used to show trends in data over time, with data points connected by a line.

Scatter plots – Scatter plots are used to show the relationship between two variables, with each data point represented by a dot on a two-dimensional graph.

Pie charts – Pie charts are used to show the distribution of data points in different categories as a percentage of the whole, with each category represented by a slice of a circular graph.

Box plots – Box plots are used to show the distribution of data points, with the box representing the interquartile range (IQR), the whiskers representing the range of the data, and outliers represented by dots or asterisks.

Heat maps – Heat maps are used to show the density of data points in a two-dimensional grid, with different colors representing different levels of density.

Histograms – Histograms are used to show the frequency distribution of a single variable, with the data grouped into intervals and represented as bars on a graph.

Frequency tables – Frequency tables are used to summarize the frequency distribution of a single variable, with the data grouped into intervals and displayed in a table.

Stacked bar charts – Stacked bar charts are used to compare the frequency or distribution of data points in different categories, with each bar divided into segments representing different subcategories.

Box and whisker plots – Box and whisker plots are used to show the distribution of data points, with the box representing the IQR and the whiskers representing the range of the data.

Stem and leaf plots – Stem and leaf plots are used to show the distribution of data points, with the stems representing the tens or hundreds digit and the leaves representing the ones or units digit.

Time series plots – Time series plots are used to show trends in data over time, with data points plotted on a graph with a time axis.

Polar plots – Polar plots are used to show the distribution of data points in a circular graph, with the distance from the center representing the value of a variable and the angle representing a category.

Waterfall charts – Waterfall charts are used to show the changes in a variable over time, with each change represented by a segment of a bar that rises or falls.

Dot plots – Dot plots are used to show the distribution of data points, with each data point represented by a dot on a horizontal axis.

Radial bar charts – Radial bar charts are used to show the distribution of data points in a circular graph, with each bar representing a category and the length of the bar representing the value of a variable.

Area charts – Area charts are used to show the trend of data over time, with data points connected by a line and the area between the line and the x-axis shaded.

Radar charts – Radar charts are used to show the distribution of data points in a circular graph, with each category represented by a spoke and the length of the spoke representing the value of a variable.

Violin plots – Violin plots are used to show the distribution of data points, with the shape of the plot representing the density of the data.

Gantt charts – Gantt charts are used to show the timeline of a project, with each task represented by a horizontal bar and the length of the bar representing the duration of the task.

Chord diagrams – Chord diagrams are used to show the relationships between different categories, with the size of the chords representing the strength of the relationships.

Word clouds – Word clouds are used to show the frequency of words in a text document, with more frequently used words displayed in larger fonts.

Sankey diagrams – Sankey diagrams are used to show the flow of data between different categories, with the width of the lines representing the volume of the data.

Spider charts – Spider charts are used to show the distribution of data points in a circular graph, with each variable represented by a spoke and the length of the spoke representing the value of the variable.

Map charts – Map charts are used to show the distribution of data points on a map, with each data point represented by a symbol or a color.

Tree maps – Tree maps are used to show the hierarchical structure of data, with each level represented by a rectangle and the size of the rectangle representing the value of the data.

Bullet charts – Bullet charts are used to show the progress towards a goal, with a vertical bar representing the actual value and a horizontal bar representing the target value.

Heat bars – Heat bars are used to show the density of data points in a one-dimensional graph, with different colors representing different levels of density.

Contour plots – Contour plots are used to show the three-dimensional shape of data, with lines representing points of equal value.

Motion charts – Motion charts are used to show changes in data over time, with data points moving on a graph.

Funnel charts – Funnel charts are used to show the conversion rates of a process, with each step of the process represented by a decreasing bar.

Marimekko charts – Marimekko charts are used to show the relationship between two categorical variables, with the width of the bars representing the relative size of the categories.

Sparklines – Sparklines are used to show the trends in data over time, with data points represented as a small line or bar within a larger text document or table.

Polar area charts – Polar area charts are used to show the distribution of data points in a circular graph, with the area of the segment representing the value of a variable.

Candlestick charts – Candlestick charts are used to show the daily changes in the price of a financial asset, with each candlestick representing the opening, closing, high, and low prices.

Radar area charts – Radar area charts are used to show the distribution of data points in a circular graph, with each variable represented by a spoke and the area of the shape representing the value of the variable.

Donut charts – Donut charts are similar to pie charts, but with a hole in the center, allowing for the display of additional information.

3D plots – 3D plots are used to show the shape of data in three dimensions, with different colors or shades representing different levels of the data.

How to Find the Methods of Data Representation Statistics

There are several ways to find the methods of data representation statistics, including:

Research online

There are a multitude of resources available online that can provide information on various methods of data representation statistics. These can include websites, academic journals, and forums.

Consult textbooks

Textbooks on statistics and data analysis often contain sections or chapters dedicated to data visualization techniques, which can provide information on various methods of data representation statistics.

Attend training courses

Many training courses on statistics and data analysis will cover various methods of data representation statistics. These courses may be offered online or in person and can be a great way to learn about different data visualization techniques.

Ask Experts

Experts in the field of statistics and data analysis can provide valuable insights into various methods of data representation statistics. This can include professors, researchers, and practitioners.

Use statistical software

Many statistical software packages come with built-in data visualization tools that can be used to explore different methods of data representation statistics. These software packages may also include tutorials and documentation that can provide information on various data visualization techniques.

By utilizing these methods, you can gain a better understanding of the different methods of data representation statistics and how to use them to effectively communicate insights and findings from your data.

Tips for Effective Data Representation

Here are some tips for effective data representation:

  • Choose the right method – Choose the method that best suits your data and your audience.
  • Keep it simple – Use simple language and avoid unnecessary jargon.
  • Be clear and concise – Use clear and concise language to communicate your findings.
  • Use colors and labels – Use colors and labels to make your data more visually appealing and easier to understand.
  • Check your data – Make sure your data is accurate and up-to-date.

Data representation statistics are essential for making sense of raw data. By converting raw data into a format that is easy to understand and interpret, data representation statistics can help you identify patterns, trends, and relationships in your data. This, in turn, can help you make informed decisions and drive data-driven decision-making. With the right methods and tips, you can effectively represent your data and communicate your findings to others.

Related Posts

8 easiest programming language to learn for beginners.

There are so many programming languages you can learn. But if you’re looking to start with something easier. We bring to you a list of…

10 Online Tutoring Help Benefits

Do you need a computer science assignment help? Get the best quality assignment help from computer science tutors at affordable prices. They always presented to help…

Javatpoint Logo

Computer Network

  • Operating Systems
  • Computer Fundamentals
  • Interview Q

Physical Layer

Data link layer, network layer, routing algorithm, transport layer, application layer, application protocols, network security.

Interview Questions

JavaTpoint

  • Send your Feedback to [email protected]

Help Others, Please Share

facebook

Learn Latest Tutorials

Splunk tutorial

Transact-SQL

Tumblr tutorial

Reinforcement Learning

R Programming tutorial

R Programming

RxJS tutorial

React Native

Python Design Patterns

Python Design Patterns

Python Pillow tutorial

Python Pillow

Python Turtle tutorial

Python Turtle

Keras tutorial

Preparation

Aptitude

Verbal Ability

Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

Artificial Intelligence

AWS Tutorial

Cloud Computing

Hadoop tutorial

Data Science

Angular 7 Tutorial

Machine Learning

DevOps Tutorial

B.Tech / MCA

DBMS tutorial

Data Structures

DAA tutorial

Operating System

Computer Network tutorial

Compiler Design

Computer Organization and Architecture

Computer Organization

Discrete Mathematics Tutorial

Discrete Mathematics

Ethical Hacking

Ethical Hacking

Computer Graphics Tutorial

Computer Graphics

Software Engineering

Software Engineering

html tutorial

Web Technology

Cyber Security tutorial

Cyber Security

Automata Tutorial

C Programming

C++ tutorial

Control System

Data Mining Tutorial

Data Mining

Data Warehouse Tutorial

Data Warehouse

RSS Feed

Smithsonian Voices

From the Smithsonian Museums

Smithsonian Environmental Research Center logo

SMITHSONIAN ENVIRONMENTAL RESEARCH CENTER

How Much Can Wetlands Fight Climate Change? A New Carbon Atlas Has The Answers.

The Coastal Carbon Atlas and Library map how wetlands store carbon around the world—and put open data to work for the environment.

Kristen Goodhue

Pools of water flood a green marsh, under a misty sky

When Superstorm Sandy reached New York on Oct. 29, 2012, it pummeled the coastline with 80 mile-per-hour winds, flooding streets and subway tunnels. Leaving over $70 billion of destruction across its entire path, Sandy ranks among the costliest natural disasters in U.S. history. But in the northeastern U.S, coastal wetlands prevented an estimated $625 million in damage.

The world needs wetlands to protect us from climate change, and not only from extreme weather. Coastal wetlands are champions at storing carbon in their soils— 231 metric tons per hectare  on average, according to one estimate. 

“Wetlands are pulling a lot of weight for the given amount of area that they take up on the planet,” said Jaxine Wolfe, a research technician with the Smithsonian Environmental Research Center (SERC). “And so there’s a lot of excitement about leveraging these ecosystems for the mitigation of climate change effects. You can do a lot by conserving a particular wetland or restoring it."

A young woman wearing goggles and purple gloves stands in a lab, preparing a sample on a table with a fume hood

“The conservation of wetlands, while it might have global effects, also has the most localized benefits,” said fellow data technician Henry Betts, citing examples like sustaining fisheries and recreation. “Keeping them healthy and growing can benefit people directly in their everyday lives.”

Wolfe and Betts work on a team illuminating the unique powers of wetlands. Last December, the team unveiled an online database centered on how wetlands store carbon worldwide: The Coastal Carbon Atlas and Library. It contains data from nearly 10,000 soil cores from every continent except Antarctica. Like a true public library, the data are freely available to everyone. And it’s revolutionizing our ability to make predictions about wetlands and climate change.

Carbon Meal Kits

James Holmquist stands in a grassy wetland, wearing field pants, boots and a blue T-shirt caked with mud

The team’s leader is wetland ecologist Jim Holmquist. Holmquist spearheaded the effort at SERC four years ago. Back then, scientists knew wetlands were major carbon sinks, but precise data were hard to find.

The timing couldn’t have been more critical. Under the Paris Agreement, nations must outline exactly how they will slash or offset greenhouse gas emissions. Many nations are relying on wetlands to shoulder part of the load. But no two wetlands are identical. Without country-specific data, governments are often left groping with regional or even global averages.

“Certain countries right now could be greatly overestimating or greatly underestimating the amount of carbon in their soils, simply because they're using a single average,” Holmquist said.

That’s where the true worth of the atlas and library lies. All data are raw, original values, freeing scientists to do any calculations they like without wondering where the numbers came from. Holmquist views it as supplying raw ingredients for researchers to create whatever recipes their communities need.

“This is a meal kit service rather than a restaurant,” he explained.

However, it’s not enough to know how much carbon wetlands store now. To plan for a changing climate, we also need to know how much they will store in the future. Holmquist and others are studying how much faster—or slower—different places accumulate carbon. Some places we rely on as carbon banks now may not be reliable later on.

“A lot of these places, we project coastal wetlands will collapse in response to sea level rise, and these stocks will be big carbon bombs,” Holmquist said. And the climate clock is ticking.

A wetland beneath a cloudy sky, with green and gold plants sprouting up from the water and a tree-lined shore in the background

Disrupting the Blue Carbon Market

Meanwhile, the market has begun catching up to wetlands’ value for people and economies. Maryland, where SERC and the Coastal Carbon Library are based, is pioneering new ways to make wetland conservation profitable.

“Blue carbon”—the name for the precious carbon wetlands store—is among many benefits often overlooked, according to Elliott Campbell of the Maryland Department of Natural Resources.

“The economy is not directly putting a value on it, because it doesn't exist within a traditional market,” he said.

Campbell drew on the Coastal Carbon Library when preparing a blue carbon study for the state, looking at how much carbon Maryland could save by restoring wetlands—or lose by destroying them.

And just two years ago, Maryland passed the Conservation Finance Act of 2022. The law created a “pay for success” program, the first conservation law like it in the U.S. It encourages private investors to finance conservation projects like wetland restoration. After completion, the state repays investors based on tons of carbon stored or other measurable ecosystem services.

But the law only works with solid data, which the Coastal Carbon Library can help provide.

“If we don't have strong science supporting the market mechanism, then how do we know we really are paying for something that's real?” asked Rachel Lamb, senior climate advisor for the Maryland Department of the Environment. “Good science is critical.”

Representation Matters

At the time of this writing, the Coastal Carbon Atlas has data from 9,804 soil cores around the globe.

“We’re going to pop a bottle of champagne once we pass 10,000,” Holmquist said. That said, version 1.0 has several gaps the team hopes to rectify in future releases. Over half the data come from North America alone.

“We definitely don't have enough data on tropical or Arctic marshes,” said Tania Maxwell of Austria’s International Institute for Applied Systems Analysis. Maxwell assembled data from over 2,000 sites, many outside North America, during a University of Cambridge fellowship. Those data are slated to join the library and atlas this spring. But cold hard data from the frigid north may have to wait.

“The Arctic is a hard place to study,” Maxwell said. “Having done field sampling and know people who've sampled in the Arctic, it's a huge, huge, huge endeavor.”

Including more data from developing nations is another high priority. Wolfe spent the past year conducting trainings in Ghana and Costa Rica, while fellow SERC technician Rose Cheney gathered data from South Africa. Though many scientists are interested in sharing data, the barriers can be legion: time, skills, language, whether another government owns the data and concern about credit.

At its heart, the database is built on trust, Wolfe said. She and Cheney emphasize consistently that the original authors get credit for any data they share. Anyone who uses the data likewise must credit them.

“Even though this is hosted through and served through the Coastal Carbon Library and Atlas, this is still their work,” Cheney said. “We just are trying to help them make it more accessible."

Two scientists in purple gloves stand beside a lab table. One is grinding a sample in a white bowl while the other looks on and smiles.

Sometimes just having the right equipment can pose a problem. To help on that front, SERC and Silvestrum Climate Associates are processing hundreds of soil samples from Sierra Leone.

As the team prepares for the next release, the tropical representation gap is already closing, Holmquist said. In the meantime, he hopes governments of every level—local, state and federal—can use the data to incentivize wetland conservation.

“A lot of people think blue carbon, they think, oh, we can get carbon credits. Carbon's money,” Holmquist said. “But I think of carbon as life. I think of it as a structural component of the ecosystem that we need to understand, to project how these things are going to survive.”

Kristen Goodhue

Kristen Goodhue | READ MORE

Kristen Goodhue is the science writer and social media manager at the Smithsonian Environmental Research Center , headquartered on Chesapeake Bay in Edgewater, Maryland. She received an M.S. in Journalism at Northwestern University, with a focus in science reporting, and a B.A. in English at Pomona College. Since joining the Smithsonian in 2011, she’s written stories about endangered orchids, marine parasites and a “wetland of the future” that mimics the world of 2100. (Photo courtesy of Kristen Goodhue)

New Data Provides a Closer Look at Race & Gender Representation

New data provides a closer look at race & gender in children’s books, research by tc’s alex eble and coauthors analyzes distribution trends in storytelling.

Students reading a book in classroom.

Books are the building blocks from which children learn; they are windows into creative spaces that inspire, educate and encourage and act as informative learning tools in the classroom.

Yet despite their educational value, researchers are grappling with how books systemically underrepresent race and gender and how this impacts youth. That’s where TC’s Alex Eble, Associate Professor of Economics and Education , and team are working to fill in the gaps with their latest research , which examines representation and identity in children’s books using tools like artificial intelligence and computational methods.

“The main question we’re asking is who is represented in the books we most often use to teach children, and why are we seeing these patterns,” explains Eble, who builds on his previous work shedding light on how the economics of education can help understand persistent inequality by gender and other historical sites of exclusion.

We sat down with Eble to discuss his findings.

Alex Eble, Associate Professor of Economics and Education.

Alex Eble, Associate Professor of Economics and Education at Teachers College. (Photo: TC Archives)

In your latest study, you use new tools and methods to assess a topic you’ve explored previously in your work: representation in children’s books and the impact of distribution trends. Can you tell us about the process and your findings?

AE: We assessed over 1,000 books that have garnered acclaim from a century of children's book awards. Our analysis concentrated on two primary categories of books for children aged 14 and under: "mainstream" and "diversity" books.

After using advanced tools to identify over 44,000 characters in these books, we found that despite equal population shares, men are more commonly represented than women in both pictures and words. We also found that white populations were represented more frequently than Black and Latino populations. Interestingly enough, children were represented with lighter skin color than adults, even conditional on race (i.e., Black children have lighter skin than Black adults). It was eye-opening and made us wonder what else we might reveal through this sort of work in the future.

That’s fascinating. How did artificial intelligence and computational methods help uncover these discoveries?

AE: Artificial intelligence played a crucial role in this project—it would not have been possible without it. We needed a tool to measure representation in children's books on a very detailed, micro-scale. It would be impractical for parents and teachers to thoroughly vet every book before offering it to their children or students, a task made even more challenging for librarians, superintendents, or policymakers—and that’s where we wanted to develop a solution.

Over the last three years, we have spent time building a tool using computational resources in New York and Chicago. It [the tool] uses artificial intelligence to scan images in picture books, turning the images into representation data based on gender, identity, and race. The data that it revealed would not have been evident without the detail that artificial intelligence was able to provide.

Talk more about “Mainstream” books versus “Diversity” books. How do they impact children in the classroom?

AE: Mainstream books are super common in libraries, school curricula, and homes. They are books recognized by the most prestigious children’s book awards: the Newbery and Caldecott. These books have profound recognition and influence; however, our research reveals that they are not very representative, with main characters often depicted with lighter skin. “Diversity” books focus more on centering previously excluded identities and are often recognized for their artistic or literary value. 

Put simply, male and white children encounter more representations of themselves in storytelling compared to underrepresented groups of children, regardless of what they are reading. This discrepancy persists across both collections of books, even in Diversity books, which are intended to do quite the opposite. Even in books designed to highlight and celebrate the experiences of Black children, these children are still less likely to see themselves represented. These patterns can shape children’s beliefs about where they and others do or do not belong in the world and, unfortunately, it’s not getting better nearly as fast as we would hope.

Educators need more tools and resources to better measure what’s in the books we’re reading and promoting to our children. They need more support from communities at large, starting at a policy level. Alex Eble, Associate Professor of Economics and Education

Your research also discusses economic trends and consumer behavior in relation to identity representation. What should we know about these developments?

AE: This is where parts of my previous research came into play, and this was really cool to see. It’s no secret that people tend to buy books that center their gender and racial identities. However, we found that books that center on dominant identities (typically white men) are more likely to be sold at a higher volume and a lower price, indicating greater demand for them than for other books. In contrast, books that center on non-dominant identities have less consumer demand and are actually priced higher.

Our research indicates a correlation between the content of purchased books in a particular area and the political inclinations of the community. We uncovered that parents are likely to buy books for their kids that represent “their version” or perspective the world. For example, we see this with politics and media splitting—if you lean conservatively, you’ll likely watch Fox News. If you lean more liberal, you’ll watch MSNBC. The same theory applies to books that parents are buying for their children.

How can communities work together to make storytelling more inclusive?

AE: As educators, writers, illustrators, and community leaders, we have a great responsibility to understand how exposure to this content can influence children’s trajectories. How does it affect what classes they take? How does it affect how they see themselves as adults in the world? We’re already taking the first step, which is using tools that can measure and evaluate book content at a very large scale and identifying the disparities, but there's more work to be done.

Educators need more tools and resources, like the one we utilized for this project (think almost a central database), to better measure what’s actually in the books we’re reading and promoting to our children. They need more support from communities at large, starting at a policy level. 

— Jacqueline Teschon

Tags: Diversity Data Visualization Diversity Inclusive Education

Programs: Economics and Education

Departments: Education Policy & Social Analysis

Published Thursday, Mar 28, 2024

Teachers College Newsroom

Address: Institutional Advancement 193-197 Grace Dodge Hall

Box: 306 Phone: (212) 678-3231 Email: views@tc.columbia.edu

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: hgs-mapping: online dense mapping using hybrid gaussian representation in urban scenes.

Abstract: Online dense mapping of urban scenes forms a fundamental cornerstone for scene understanding and navigation of autonomous vehicles. Recent advancements in mapping methods are mainly based on NeRF, whose rendering speed is too slow to meet online requirements. 3D Gaussian Splatting (3DGS), with its rendering speed hundreds of times faster than NeRF, holds greater potential in online dense mapping. However, integrating 3DGS into a street-view dense mapping framework still faces two challenges, including incomplete reconstruction due to the absence of geometric information beyond the LiDAR coverage area and extensive computation for reconstruction in large urban scenes. To this end, we propose HGS-Mapping, an online dense mapping framework in unbounded large-scale scenes. To attain complete construction, our framework introduces Hybrid Gaussian Representation, which models different parts of the entire scene using Gaussians with distinct properties. Furthermore, we employ a hybrid Gaussian initialization mechanism and an adaptive update method to achieve high-fidelity and rapid reconstruction. To the best of our knowledge, we are the first to integrate Gaussian representation into online dense mapping of urban scenes. Our approach achieves SOTA reconstruction accuracy while only employing 66% number of Gaussians, leading to 20% faster reconstruction speed.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

data and data representation

FlexSim: 3D discrete event simulation software

Easy-to-use 3D simulation modeling and analysis software with high-end capabilities

FlexSim

  • Capabilities
  • Try FlexSim for free

What is FlexSim?

FlexSim is easy-to-use 3D discrete event simulation software with high-end capability.

Drag-and-drop workflows to easily model production and people movement processes.

Built-in scenario manager to run experiments, make accurate predictions, and optimize.

Pre-packaged with modules to add conveyor systems, automated guided vehicles (AGVs), warehousing systems, supply chain, healthcare, and more.

See system requirements:

data and data representation

FlexSim overview video (1:30 min.)

An automatic storage system, rendered in FlexSim’s RTX mode

An automatic storage system, rendered in FlexSim’s RTX mode.

A door painting system with automated material handling

A door painting system with automated material handling.

An automotive assembly line, viewed through a VR headset

An automotive assembly line, viewed through a VR headset.

A warehouse model featuring conveyors and racks

A warehouse model featuring conveyors and racks.

A dashboard with a variety of built-in charts and graphs

A dashboard with a variety of built-in charts and graphs.

An automatic storage system, rendered in FlexSim’s RTX mode

Why use FlexSim?

Accurate predictions and optimizations.

A data-driven, evidence-based method to predict how changes will impact production

Minimize risks and disruption

Test “what if” scenarios in a digital model without disrupting real-world operations

Complete, collaborative digital representation

Realistic 3D visuals and process steps to share the factory story with stakeholders

What you can do with FlexSim

Realistic 3D visuals, detailed process steps, and data

Realistic 3D visuals, detailed process steps, and data

Tell the complete factory story

Create a data-driven, evidence-based model that combines detailed process steps, accurate production data, and realistic 3D visuals to tell the complete factory story and engage with all stakeholders.

Test many scenarios and find a range of optimal solutions

Test many scenarios and find a range of optimal solutions

Answer “what if” questions and respond to change

FlexSim provides a risk-free virtual environment to experiment with different scenarios, helping you to validate designs, plan for unfamiliar conditions, and discover optimal solutions.

Faster simulation model layout and logic building

Faster simulation model layout and logic building

Get accurate results faster

With drag-and-drop layout, a full library of objects with pre-built logic, and a code-free logic-building environment, FlexSim significantly reduces the time it takes to build accurate simulation models.

thredUP

“The FlexSim model allowed us to do optimization runs where we could zero in on how many people we needed on each level and how big a pick wave we can run.”

– John Friedl, Senior Vice President of Automation and Innovation, thredUP

LM Wind Power

“FlexSim gives you a quick and rough, high-level idea of which brainstormed ideas will give the best output. Then narrow down to 2-3 options and make detailed models in a more focused way.”

– Michael Belote, Director of Manufacturing 2.0, LM Wind Power

LM Wind Power

“The beauty of the simulation model is that we have an immersive environment to virtually experience the workspace while conducting what-if scenarios on the system.”

– Jason Merschat, President, Advanced Process Optimization, Inc.

FlexSim learning resources

DOCUMENTATION

FlexSim Documentation

Full documentation and reference for FlexSim, including tutorials, release notes, and more.

FlexSim Answers

FlexSim’s Q&A knowledge base and support site, with a vast database of simulation knowledge.

Frequently asked questions (FAQs)

What is flexsim used for.

FlexSim is used to model production, logistics, and people movement processes, and use that model to visualize, analyze, and optimize the system. FlexSim users could be designing a new factory layout, or they could be responding to a bottleneck on the production floor, or they could be validating a reconfiguration plan to predict its impact on future operations.

Who uses FlexSim?

FlexSim users are typically focused on process improvement and often work in an industrial engineering or manufacturing engineering role. Anyone who wants to better understand or improve production, logistics, and people movement processes can benefit from FlexSim—including project managers, analysts, quality assurance, operations, health systems engineers, Lean/Six Sigma Black Belts, and more.

How does FlexSim fit in with digital twin solutions?

At its core, a digital twin is a virtual representation of a physical process. FlexSim’s core competency is creating 3D models that represent the look and behavior of your existing system, and these models simulate how the actual system would respond using different inputs and layouts. For advanced digital twin solutions, FlexSim can pull in data at regular (or even real-time) intervals and run simulations.

Support & learning

Get FlexSim documentation, tutorials, downloads, and support.

Privacy | Do not sell or share my personal information | Cookie preferences | Report noncompliance | Terms of use | Legal  |  © 2024 Autodesk Inc. All rights reserved

🏀 Women's Tournament

🎟️ First 2 Final Four spots clinched

🐺 (3) NC St. over (1) Texas

🐔 (1) S. Carolina over (3) Oregon St.

👀 See bracket

Check your bracket

Kaitlyn Schmidt | NCAA.com | March 31, 2024

Tracking 2024 march madness men's records by conference.

data and data representation

There are 10 conferences with two or more teams in the 2024 March Madness men's tournament. You can track their progress below.

Here are the current conference-by-conference standings, updated through the Elite Eight round. Last year, the Big East went 12-4, with UConn winning the national championship.

2024 NCAA tournament schedule, scores, highlights

Saturday, April 6 (Final Four)

  • (1) Purdue vs. (11) NC State | 6:09 p.m. ET | TBS/TNT/tru TV
  • (1) UConn vs. (4) Alabama | 9:20 p.m. ET | TBS/TNT/tru TV

Monday, April 8 (National championship game)

  • TBD vs. TBD | 9:20 p.m.

Tuesday, March 19 (First Four in Dayton, Ohio)

  • (16) Wagner 71 , (16) Howard 68
  • (10) Colorado State 67 , (10) Virginia 42

Wednesday, March 20 (First Four in Dayton, Ohio)

  • (16) Grambling 88 , (16) Montana State 81
  • (10) Colorado 60 , (10) Boise State 53

Thursday, March 21 (Round of 64)

  • (9) Michigan State 66 , (8) Mississippi State 51
  • (11) Duquesne 71 , (6) BYU 67
  • (3) Creighton 77 , (14) Akron 60
  • (2) Arizona 85 , (15) Long Beach State 65
  • (1) North Carolina 90 , (16) Wagner 61
  • (3) Illinois 85 , (14) Morehead State 69
  • (11) Oregon 87 , (6) South Carolina 73
  • (7) Dayton 63 , (10) Nevada 60
  • (7) Texas 56 , (10) Colorado State 44
  • (14) Oakland 80 , (3) Kentucky 76
  • (5) Gonzaga 86 , (12) McNeese 65
  • (2) Iowa State 82 , (15) South Dakota State 65
  • (2) Tennessee 83 ,   (15) Saint Peter's 49
  • (7) Washington State 66 , (10) Drake 61
  • ( 11) NC State 80 , (6) Texas Tech 67
  • (4) Kansas 93 , (13) Samford 89

Friday, March 22 (Round of 64)

  • (3) Baylor 92 ,   (14) Colgate 67
  • (9) Northwestern 77 , (8) Florida Atlantic 65  (OT)
  • (5) San Diego State 69 , (12) UAB 65
  • (2) Marquette 87 ,   (15) Western Kentucky 69
  • (1) UConn 91 , (16) Stetson 52
  • (6) Clemson 77 , (11) New Mexico 56
  • (10) Colorado 102 , (7) Florida 100   
  • (13) Yale 78 , (4) Auburn 76 
  • (9) Texas A&M 98 , (8) Nebraska 83
  • (4) Duke 64 , (13) Vermont 47
  • (1) Purdue 78 , (16) Grambling 50
  • (4) Alabama 109 , (13) College of Charleston 96
  • (1) Houston 86 , (16) Longwood 46
  • (12) James Madison 72 , (5) Wisconsin 61
  • (8) Utah State 88 , (9) TCU 72 
  • (12) Grand Canyon 77 , (5) Saint Mary's 66

Saturday, March 23 (Round of 32)

  • (2) Arizona 78,  (7) Dayton 68
  • (5) Gonzaga 89 , (4) Kansas 68
  • (1) North Carolina 85 , (9) Michigan State 69
  • (2) Iowa State 67 , (7) Washington State 56
  • (11) NC State 79 , (14) Oakland 73
  • (2) Tennessee 62 , (7) Texas 58
  • (3) Illinois 89 , (11) Duquesne 63 
  • (3) Creighton 86 , (11) Oregon 73 (2OT)

Sunday, March 24 (Round of 32)

  • (2) Marquette 81,  (10) Colorado 77
  • (1) Purdue 106,  (8) Utah State 67
  • (4) Duke 93 , (12) James Madison 55 
  • (6) Clemson 72 , (3) Baylor 64
  • (4) Alabama 72 , (12) Grand Canyon 61
  • (1) UConn 75 , (9) Northwestern 58
  • (1) Houston 100 , (9) Texas A&M 95 (OT)
  • (5) San Diego State 85 , (13) Yale 57 

Thursday, March 28 (Sweet 16)

  • (6) Clemson 77 , (2) Arizona 72
  • (1) UConn 82 , (5) San Diego State 52
  • (4) Alabama 89 , (1) North Carolina 87
  • (3) Illinois 72 , (2) Iowa State 69

Friday, March 29 (Sweet 16)

  • (11) NC State 66 , (2) Marquette 58
  • (1) Purdue 80 , (5) Gonzaga 68
  • (4) Duke 54 , (1) Houston 51
  • (2) Tennessee 82 , (3) Creighton 75

Saturday, March 30 (Elite Eight)

  • (1) UConn 77 , (3) Illinois 52
  • (4) Alabama 89 , (6) Clemson 82

Sunday, March 31 (Elite Eight)

  • (1) Purdue 72 , (2) Tennessee 66
  • (11) NC State 76 , (4) Duke 64

data and data representation

  • What defines UConn, Purdue, Alabama and NC State's path to the 2024 Men's Final Four

data and data representation

  • Purdue heads to the Final Four, starting a celebration 44 years into the making

data and data representation

  • 2024 March Madness: Men's NCAA tournament schedule, dates

March Madness

  • 🗓️ 2024 March Madness schedule, dates
  • 👀 Everything to know about March Madness
  • ❓ How the field of 68 is picked
  • 📓 College basketball dictionary: 51 terms defined

data and data representation

Greatest buzzer beaters in March Madness history

data and data representation

Relive Laettner's historic performance against Kentucky

data and data representation

The deepest game-winning buzzer beaters in March Madness history

data and data representation

College basketball's NET rankings, explained

data and data representation

What March Madness looked like the year you were born

Di men's basketball news.

  • Latest bracket, schedule and scores for 2024 NCAA men's tournament
  • The lowest seeds to make the men's Final Four, Elite Eight and Sweet 16
  • 2024 NCAA Division I Men’s Basketball Championship Final Four Tips Off Saturday, April 6, on TBS, TNT and truTV
  • How many brackets correctly pick every Final Four team each year, since 2014
  • Tracking 2024 March Madness men's records by conference
  • Transfer success stories add meaning to Purdue vs. Tennessee Elite Eight matchup
  • This year's Elite Eight teams are all looking to change narratives with a Final Four run

Follow NCAA March Madness

IMAGES

  1. What is Data Visualization? Definition, Examples, Best Practices

    data and data representation

  2. Data Visualization Techniques for Effective Data Analysis

    data and data representation

  3. What is data representation?

    data and data representation

  4. How to Use Data Visualization in Your Infographics

    data and data representation

  5. How To Visualize The Common Data Points

    data and data representation

  6. Data Representation: Definitions, Forms and Solved Examples

    data and data representation

VIDEO

  1. L-05: Data, Data Representation, Types of study and Sample Size

  2. Introduction to Statistics and Representation of Data

  3. Data Representation part 1

  4. Data representation in tables 01

  5. Data Structure vs Representation in SEO

  6. Lecture 34: Representation of Data and Inferences-I

COMMENTS

  1. How do computers represent data?

    How do computers represent data? Google Classroom. When we look at a computer, we see text and images and shapes. To a computer, all of that is just binary data, 1s and 0s. The following 1s and 0s represents a tiny GIF: This next string of 1s and 0s represents a command to add a number: You might be scratching your head at this point.

  2. Data Representation: Definition, Types, Examples

    Data Representation: Data representation is a technique for analysing numerical data. The relationship between facts, ideas, information, and concepts is depicted in a diagram via data representation. It is a fundamental learning strategy that is simple and easy to understand. It is always determined by the data type in a specific domain.

  3. PDF Data Representation

    Data Representation Data Representation Eric Roberts CS 106A February 10, 2016 Claude Shannon Claude Shannon was one of the pioneers who shaped computer science in its early years. In his master's thesis, Shannon showed how it was possible to use Boolean logic and switching circuits to perform arithmetic calculations. That work led

  4. Data representation

    Data representation. Computers use binary - the digits 0 and 1 - to store data. A binary digit, or bit, is the smallest unit of data in computing. It is represented by a 0 or a 1. Binary numbers are made up of binary digits (bits), eg the binary number 1001. The circuits in a computer's processor are made up of billions of transistors.

  5. Data representation 1: Introduction

    The operating system and hardware ensure that data in this segment is not changed during the lifetime of the program. Any attempt to modify data in the code segment will cause a crash. i1, the int global object, has the next highest address. It is in the data segment, which holds modifiable global data. This segment keeps the same size as the ...

  6. Decoding Computation Through Data Representation

    Data representation is fundamentally about how information is encoded for computation. Every piece of information, from a simple number to a complex structure like a video or an image, must be represented so that computers can understand, manipulate, and store. At the same time, we as humans rely on abstractions of data primitives for ...

  7. PDF Lecture Notes on Data Representation

    L9.2 Data Representation The constructor for elements of recursive types is fold, while unfold destructs elements. 'e: [ˆ :˝= ]˝ 'folde: ˆ :˝ 'e: ˆ :˝ 'unfolde: [ˆ :˝= ]˝ This "unfolding" of the recursion seems like a strange operation, and it is. For example, for all other data constructors the components have a smaller

  8. PDF Data Representation

    Table 1.1: Powers of Two and their Binary Representation Because of this property, numbers that are a power of two are very, very common when talking about computers. Table1.1shows powers of two up to 210 and their binary representation. The powers of two show up repeatedly in the sizes of di erent objects

  9. Explore how computers represent text, numbers, images and sound and how

    This guide to data representation covers all the key concepts you need to know to understand the principles of representing data in computer systems. Whether you're a GCSE, IB or A-level computer science student, our guide provides a detailed explanation of how data is represented in binary, hexadecimal, and ASCII formats, as well as the ...

  10. Data Representation

    Data Representation in Computers. Information handled by a computer is classified as instruction and data. A broad overview of the internal representation of the information is illustrated in figure 3.1. No matter whether it is data in a numeric or non-numeric form or integer, everything is internally represented in Binary.

  11. Data representations

    Data representations problems ask us to interpret data representations or create data representations based on given information. Aside from tables, the two most common data representation types on the SAT are bar graphs and line graphs. In this lesson, we'll learn to: You can learn anything. Let's do this!

  12. Data representation

    The problem of data representation is the problem of representing all the concepts we might want to use in programming—integers, fractions, real numbers, sets, pictures, texts, buildings, animal species, relationships—using the limited medium of addresses and bytes. Powers of ten and powers of two.

  13. Data Representation in Computer: Number Systems, Characters

    A computer uses a fixed number of bits to represent a piece of data which could be a number, a character, image, sound, video, etc. Data representation is the method used internally to represent data in a computer. Let us see how various types of data can be represented in computer memory. Before discussing data representation of numbers, let ...

  14. PDF Chapter 4: Data Representations

    Data Representation Goal: to store numbers, characters, etc. in computer Location: store in a memory location a BOX or CONTAINER that can hold a value (Memory is just an array (1-D) of these boxes, address is just the array index) Concentrate on one box. binary representation- represent all information using only 0s and 1s (low/high voltage ...

  15. 2.1: Types of Data Representation

    2.1: Types of Data Representation. Page ID. Two common types of graphic displays are bar charts and histograms. Both bar charts and histograms use vertical or horizontal bars to represent the number of data points in each category or interval. The main difference graphically is that in a bar chart there are spaces between the bars and in a ...

  16. Introduction to Data Representation

    Data Representation Techniques. Classification of Computers. Computer scans are classified broadly based on their speed and computing power. 1. Microcomputers or PCs (Personal Computers): It is a single-user computer system with a medium-power microprocessor. It is referred to as a computer with a microprocessor as its central processing unit.

  17. PDF Data Representation

    Data Representation • Data refers to the symbols that represent people, events, things, and ideas. Data can be a name, a number, the colors in a photograph, or the notes in a musical composition. • Data Representation refers to the form in which data is stored, processed, and transmitted. • Devices such as smartphones, iPods, and

  18. What are the different ways of Data Representation?

    Data Representation. The word data refers to constituting people, things, events, ideas. It can be a title, an integer, or anycast. After collecting data the investigator has to condense them in tabular form to study their salient features. Such an arrangement is known as the presentation of data.

  19. What is Data Representation?

    Learn more about Data Representation. Take a deep dive into Data Representation with our course AI for Designers . In an era where technology is rapidly reshaping the way we interact with the world, understanding the intricacies of AI is not just a skill, but a necessity for designers. The AI for Designers course delves into the heart of this ...

  20. Master The Art of Data Representation Statistics

    Table of Contents. Data representation statistics is the process of converting raw data into a format that is easy to understand and interpret. This involves using various statistical methods to analyze and summarize the data. Data representation statistics can help you identify patterns, trends, and relationships in your data, which can help ...

  21. Data Representation in Computer Organization

    Data can be anything like a number, a name, notes in a musical composition, or the color in a photograph. Data representation can be referred to as the form in which we stored the data, processed it and transmitted it. In order to store the data in digital format, we can use any device like computers, smartphones, and iPads.

  22. Data Representation in Computer Network

    Data Representation. A network is a collection of different devices connected and capable of communicating. For example, a company's local network connects employees' computers and devices like printers and scanners. Employees will be able to share information using the network and also use the common printer/ scanner via the network.

  23. How Much Can Wetlands Fight Climate Change? A New Carbon Atlas Has The

    Representation Matters At the time of this writing, the Coastal Carbon Atlas has data from 9,804 soil cores around the globe. "We're going to pop a bottle of champagne once we pass 10,000 ...

  24. New Data Provides a Closer Look at Race & Gender Representation

    It [the tool] uses artificial intelligence to scan images in picture books, turning the images into representation data based on gender, identity, and race. The data that it revealed would not have been evident without the detail that artificial intelligence was able to provide. Talk more about "Mainstream" books versus "Diversity" books.

  25. Acquiring an AI Company

    A basis for indemnification claims or representation and warranty insurance (RWI) claims if the seller breaches a representation (see Indemnification and Representation and Warranty Insurance below).

  26. HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation

    To this end, we propose HGS-Mapping, an online dense mapping framework in unbounded large-scale scenes. To attain complete construction, our framework introduces Hybrid Gaussian Representation, which models different parts of the entire scene using Gaussians with distinct properties. Furthermore, we employ a hybrid Gaussian initialization ...

  27. FlexSim 2025

    FlexSim is easy-to-use 3D discrete event simulation software with high-end capabilities, transforming existing data into accurate, profitable production decisions. ... At its core, a digital twin is a virtual representation of a physical process. FlexSim's core competency is creating 3D models that represent the look and behavior of your ...

  28. Tracking 2024 March Madness men's records by conference

    Here are the DII men's basketball programs that have won multiple national championships. The 64-team DII men's basketball tournament concludes the basketball season every March. Here's everything ...

  29. PDF Dale Cecil v. American Federation of State, County, and Municipal

    representation of him, his negligence claim is abrogated by his statutory fair representation claim. See Brown, 690 A.2d at 957, 960 (holding that the plaintiff's duty of fair representation claim preempted his negligence claim when the claim was based on a union lawyer's failure to timely file a request to move forward). Because Mr. Cecil's