Data flow book pdf




















The start of each infeasible path can be delayed in the forward direction, as specified by the definition below. Definition 2. Determining shortest infeasible paths maximizes the number of def-use pairs that can be excluded during def-use analysis because more def-use pairs can span shorter infeasible paths. The central idea behind placing the start mark is that, if a query at a node n has a single answer, then the start of the corresponding path can be delayed past n because only infeasible paths enter n.

However, if a query has multiple answers at n, the start must precede n because some feasible paths may pass through n. Thus the start mark is placed at the edge where the query has a single answer but the destination node of the edge has multiple answers to the query. The marks for the shortest infeasible paths leading to branch b are placed in Step 3 Fig. The end marks are always placed in the out-edges of the conditional branch lines The present marks are placed at all nodes where a query was raised lines Throughout the remainder of the paper, the term infeasible path refers to the shortest infeasible path.

The algorithm we have presented can be used to exhaustively compute in- feasible paths or to incrementally update infeasible path information. Marking infeasible paths on the control flow graph. Consider a statement n at which the set of path marks present is not empty. Any modification to n or insertion of new statements in n will require the infeasible paths included in the present set to be recomputed. The example in Fig. We use regular expression notation to denote the paths.

The variable names below each node number denote the predicate expression from the query raised at that node. Consider the computation of the infeasible paths for node 7. Using the algorithm in Fig. After path marking Fig. Thus no path going through the out-edge of node 4 can take the false exit of node 7. The shortest infeasible paths exclude 1 2 [3 43 because it is guaranteed that at node 5, the value of w is still the constant 1.

The infeasible t. An example of intraprocedural infeasible paths. In this section we present a data flow analysis method that provides refined data flow information by tracing the infeasible paths and excluding def-use pairs that are formed exclusively along infeasible paths.

Def-use pairs are determined by solving the data flow problem of reaching definitions. Given the set of definitions that reach a node, we can determine def-use pairs for all of the uses in the node. Traditional data flow analysis con- servatively assumes that all program paths are executable and computes the data flow information as a meet operation along all paths. To refine the result of the analysis, we strengthen the definition of a def-use pair: if all paths between a definition and a use contain an infeasible path, then this def-use pair is infeasible and is excluded from the set of def-use pairs found by the analysis.

To exclude infeasible reaching definitions, we associate with the data flow information of each reaching definition traditionally, a single bit in a data flow vector information about infeasible paths that have been encountered in the propagation of the reaching definition.

We call these paths infeasible paths in progress. When the propagation of a reaching definition d encounters the start mark of an infeasible path, we remember the path in the propagated data flow information in order to trace the encountered path. When d reaches the end mark of the path without previously leaving the path, we can remove d from further consideration, as d has traversed an infeasible path.

However, when d leaves the infeasible path before its end mark is reached, the tracing information about the path is removed from the data flow information because the path is no longer in progress, and d is propagated further. Reaching definitions can be computed either using an exhaustive data flow algorithm such as iterative [l] or using a demand-driven algorithm [3].

In an exhaustive algorithm, the reaching definitions are computed for all variables at all nodes. In the demand-driven algorithm, the reaching deilnitions for each variable used in a node are computed. Recent studies have demonstrated that demand- driven algorithms take less time and space to compute reaching definitions than exhaustive ones, even when the computation of all def-use pairs is required [4,12].

We present in this paper the demand-driven version of def-use analysis. It computes def-use pairs for a variable v used at a node u.

Similar to the demand-driven algorithm in Fig. Removing the query ensures that when a definition of v is encountered by a query, only a feasible def-use pair is recorded. To determine when a query can be safely discarded, a set of paths in progress are carried with each query and are updated by the algorithm as the query traverses paths marks.

The initial query is formed and raised at lines The query has a single component, ipp, the set of infeasible paths in progress.

Initially, this set is empty. Note that, since the propagation proceeds in the opposite direction as in the exhaustive analysis, the end mark is considered to be the start of the path. Line 6 considers each propagated query. Function resolve updates the ipp information for the query and determines if a reaching definition has been encountered. Line 8 discards the query if an infeasible path in progress ends at the edge e.

Lines 9 to 14 update the tracing information. Lines record a new def-use pair and terminate propagation of the query if a definition has been reached. Procedure raise-query performs the meet data flow function for the ipp information. While the problem of reaching definitions is a may-problem, the set ipp computes a must-problem; a path in progress is preserved at a control flow meet point only if it was in progress in each query that reached the meet point.

Intraprocedural demand-driven def-use analysis. In the algorithm, the query is propagated further in line 24 when it is raised at the node for the first time line 22 or when a path previously in progress at node m has been removed at line For the example in Fig. In response to a query raised at the use of z in node 8, our def-use analysis excludes the def-use pair 4,8 because the propagated query is removed at the edge 4,5. This edge is the start of the infeasible path that ends at the false exit of node 7.

The def-use pair 4,12 on variable z is excluded due to the first infeasible path leading to the false exit of.

Finally, the def-use pair 2,12 on y is excluded due to the infeasible path that leads to the true exit of node 7. The demand-driven algorithm which finds the definitions reaching a given l use is also useful for determining more precise program slices [17].

By repeated application of this algorithm, the data slice corresponding to a given statement node can be easily computed. Due to the relined defuse analysis, this algorithm computes smaller slices than traditional slicing algorithms.

Time complexity. The cost of ,our technique can be divided between the infeasible paths analysis and the def-use analysis. The cost of the former is dom- inated by Step 1 in Fig.

In our experiments, the pattern of analyzed condi- tionals was restricted to branches that compare a variable with a constant e. Under these restrictions, the cost to find infeasible paths leading to a single branch is O NV , where N is the number of nodes in the program CFG, and V is the number of program variables. All infeasible paths can be found in O N2V steps. The cost of finding all def-use pairs for a single use is bounded by O NI , where I is the maximum number of infeasible paths that cross a node.

The value of 1 bounds the number of times a query can be re-raised on a single node line 24 in Fig. All def-use pairs can thus be found in O N21 steps. When implementing a practical defuse analyzer, i it is however not required to develop both techniques interprocedurally. Obvi- ously, interprocedural def-use analysis can benefit from purely intraprocedural infeasible paths.

In the remainder of this section we describe the interprocedural versions of both infeasible path analysis and the def-use analysis. Detecting infeasible paths. The interprocedural version of infeasible path analysis from Fig. The extension is based on propagating queries between callers and callees and maintaining appropriate procedure summary nodes.

The infeasible path marking algorithm in Fig. Interprocedural Reaching Definitions. The extension of the def-use analysis requires the elimination of reaching definitions that occur exclusively along infeasible interprocedural paths. This may affect not only interprocedural but also intraprocedural def-use pairs.

For example, at a call site node, a reach- ing definition may be killed along one path through the called procedure but may reach the procedure exit along the other possible path through the cdlee. If the latter path is found infeasible by interprocedural analysis, this interprocedu- ral exclusion, of the reaching definition may eliminate an intraprocedural def-use pair in the calling procedure.

The extension involves the introduction of procedure summary nodes, which are computed independently of the calling context. The summary node of a procedure maps a variable and a specific set of infeasible paths in progress ipp to the set of reaching definitions generated within the procedure.

Due to the many possible subsets of the ipp set, the amount of information to be stored in the summary node is significant. However, since only a fraction of a summary node will likely be referenced, we present in this section the demand-driven def- use analysis based on [3], which achieves efficiency by computing only the needed subset of each summary node.

In the context of query-based analysis, the purpose of summary node is to cache for each query raised on the exit of a procedure the answers to the query that were found within the procedure and its callees. In our reaching definitions analysis, each query is associated with the analyzed variable v and the set of infeasible paths ipp encountered during propagation of the query. The summary node entry Sflz,v, ipp] is thus a triple RD, tramp, dpp. The algorithm in Fig. Whenever a query is about to enter a procedure exit node, the summary node is looked up for a previously cached result.

If the lookup fails, the summary node entry is computed by raising at the procedure exit an identical query, which is however marlced as a special, summary node query. Interprocedural demand-driven def-use analysis. To support both standard and summary node queries, the algorithm represents a query with a pair ipp, sn , where sn is a pointer to the summary node entry that is being computed by the query.

The initial query is created at line 2, where the nil value of the sn pointer signifies that this is a standard query i. The transp and ipp fields of the entry are recorded in line Line 14 raises the query in all callers, restricting the scope of summary node queries to the procedure.

The procedures resolve and raise-query from Fig. An example of interprocedural infeasible paths is shown in Fig. An example of interprocedural infeasible paths.

The experiments were performed on the integer benchmarks from the SPEC All benchmarks are real application programs and, as Table 1 shows, are : of considerable size. The first three columns list the number of source lines, the number of procedures defined in the program, and the number of external library,?

Another implementation restriction was that the function substitute used in Fig. Our infeasible path detection technique, however, supports the analysis of arbitrary predicate expressions and is limited only by the capabilities of the symbolic evaluation routines in the compiler. The effectiveness of our branch correlation analysis is described in the last group of columns in Table 1.

The colump analyzable describes what percentage of all conditionals in the graph were analyzed by the analysis, given the restric- tions mentioned above. The columns con-P and con-1 give the number of con- ditionals that have some correlation, detected using intra- and inter-procedural infeasible paths analyses, respectively. The benchmarks: program size and amount of correlated branches. Table 2. The costs and results of our def-use analysis.

The number of correlated conditionals is given as a percentage of all conditional nodes. The comparison of the traditional def-use analysis and our def-use analy- sis is given in the last three columns of Table 2. Both analyses were restricted to intraprocedural def-use pairs and each call site node was assumed to be a definition of each global variable. Column tradit gives the number of defuse pairs found by the traditional def-use analysis.

The columns elim-P and elim-I give the number of pairs that were eliminated by our def-use analysis, using the intra- and inter-procedural infeasible paths analyses, respectively.

With intrapro- cedural infeasible paths, we are able to eliiate 2. While thii may appear to be a small amount, knowing these infeasible def-use pairs will significantly strengthen the confidence in the testing level of a program. Table 2 also presents the cost of our analysis. Since the set of infeasible paths ipp is best implemented by a bit vector, we are interested in the maximum and average size of the present marking sets, across all edges in the control flow graph. The low average values suggest that many edges in the graph contain no infeasible paths markings.

Note that when the ipp set and all three mark sets are empty at lines in Fig. Next, we report the number of steps performed by the considered demand-driven def-use analyses, measured in the number of times a query was removed from the worklist at line 5 in Fig.

We can observe that infeasibility-aware analysis does not require significantly more steps to terminate than the traditional def-use analysis. First, program paths that are infeasible due to branch correlation are detected and marked on the flow graph. Remember me on this computer. Enter the email address you signed up with and we'll email you a reset link.

Need an account? Click here to sign up. Download Free PDF. Understanding Data Flow Diagrams. Fabian Khan. Kara Para. A short summary of this paper. Le Vie, Jr. DFDs consist of four major flowcharts and pseudocode a combination of English components: entities, processes, data stores, and data and the programming language being written to design flows.

The symbols used to depict how these components programs. Pseudocode can actually be simpler to read interact in a system are simple and easy to understand; than corresponding flowcharts, as Figure 1 illustrates. DFD syntax does remain Pseudocode constant by using simple verb and noun constructs. No Add 1. DFDs are easier to understand by technical and Is plant Yes some nontechnical audiences a water 2.

DFDs can provide a high level system overview, succulent complete with boundaries and connections to other No Add lots systems of water 3. DFDs can provide a detailed representation of system components1 Figure 1. DFDs help system designers and others during initial analysis stages visualize a current system or one that may be necessary to meet new requirements. An ERD between existing systems and postulated systems. DFDs shows what data is being used in the process or program, represent the following: and how the files are related.

The E-R entity- relationship data model views the real world as a set of 1. External devices sending and receiving data basic objects entities and relationships among these 2.

Processes that change that data objects. It is intended primarily for the database design 3. Data flows themselves process by allowing for the specification of an enterprise 4.

Data storage locations scheme. This enterprise scheme represents the overall logical structure of the database. Entities are also referred to as agents, customer account no. In other words, a process receives input and generates some output. Processes can be drawn as circles or a segmented rectangle on a DFD, and include a process name and process number.

Figure 2. An Example of an Entity-Relationship Diagram. Data Store A data store is where a process stores data between Data Flow Diagrams processes for later retrieval by that same process or Data flow diagrams have replaced flowcharts and another one.

Files and tables are considered data stores. A DFD is illustrated in Figure 3. Paycheck Data Flow D1 Bill payable Data flow is the movement of data between the entity, the process, and the data store. Data flow portrays the A1 interface between the components of the DFD. The flow Bill details of data in a DFD is named to reflect the nature of the Prepare Deposit data used these names should also be unique within a bank amount specific DFD. Data flow is represented by an arrow, deposit A2 where the arrow is annotated with the data name.

An Example of a Data Flow Diagram. Process Create could be Student Entity drawn as Record An entity is the source or destination of data. The source a circle in a DFD represents these entities that are outside the context of the system. Entities either provide data to the ID Data Store B2 Student Record system referred to as a source or receive data from it referred to as a sink. Entities are often represented as rectangles a diagonal line across the right-hand corner Figure 4.

We begin by making a list of business activities to determine the DFD elements external entities, data flows, processes, and data stores. The Diagram-0, or Level 0 diagram, is next, no hard and fast rules when it comes to producing DFDs, which reveals general processes and data stores see but there are when it comes to valid data flows. For the Figures 5 and 6. Following the drawing of Level 0 most accurate DFDs, you need to become intimate with diagrams, child diagrams will be drawn Level 1 the details of the use case study and functional diagrams for each process illustrated by Level 0 specification.

Keep in mind that if your DFD looks like a Picasso, it could be External External an accurate representation of your current physical Entity Entity system. Preliminary Investigation of Text Information This process The first step is to determine the data items, which are bubble could Computer- External usually located in documents but not always.

Construct a table to organize your corners… information, as shown in Table 1.



0コメント

  • 1000 / 1000