Network Working Group Request for Comments: 911 EGP GATEWAY UNDER BERKELEY UNIX 4.2 PAUL KIRTON University of Southern California, Information Sciences Institute Visiting Research Fellow from Telecom Australia Research Laboratories 22 August 1984 ABSTRACT This report describes an implementation of the Exterior Gateway Protocol that runs under the Unix 4.2 BSD operating system. Some issues related to local network configurations are also discussed. Status of this Memo: This memo describes an implementation of the Exterior Gateway Protocol (EGP) (in that sense it is a status report). The memo also discusses some possible extentions and some design issues (in that sense it is an invitation for further discussion). Distribution of this memo is unlimited. Funding for this research was provided by DARPA and Telecom Australia. RFC 911 i Table of Contents 1. INTRODUCTION 1 1.1 Motivation for Development 1 1.2 Overview of EGP 2 2. GATEWAY DESIGN 4 2.1 Routing Tables 4 2.1.1 Incoming Updates 5 2.1.2 Outgoing Updates 5 2.2 Neighbor Acquisition 6 2.3 Hello and Poll Intervals 6 2.4 Neighbor Cease 7 2.5 Neighbor Reachability 7 2.6 Sequence Numbers 8 2.7 Treatment of Excess Commands 8 2.8 Inappropriate Messages 8 2.9 Default Gateway 9 3. TESTING 10 4. FUTURE ENHANCEMENTS 11 4.1 Multiple Autonomous Systems 11 4.2 Interface Monitoring 11 4.3 Network Level Status Information 11 4.4 Interior Gateway Protocol Interface 12 5. TOPOLOGY ISSUES 13 5.1 Topology Restrictions and Routing Loops 13 5.1.1 Background 13 5.1.2 Current Policy 14 5.2 Present ISI Configuration 15 5.2.1 EGP Across ARPANET 17 5.2.2 EGP Across ISI-NET 17 5.2.3 Potential Routing Loop 18 5.3 Possible Future Configuration 18 5.3.1 Gateway to UCI-ICS 18 5.3.2 Dynamic Switch to Backup Gateway 19 5.3.2.1 Usual Operation 19 5.3.2.2 Host Initialization 19 5.3.2.3 When Both the Primary and Backup are Down 20 5.3.2.4 Unix 4.2 BSD 20 6. ACKNOWLEDGEMENT 21 7. REFERENCES 22 RFC 911 1 1. INTRODUCTION The Exterior Gateway Protocol (EGP) [Rosen 82; Seamonson & Rosen 84; Mills 84a] has been specified to allow autonomous development of different gateway systems while still maintaining global distribution of internet routing information. EGP provides a means for different autonomous gateway systems to exchange information about the networks that are reachable via them. This report mainly describes an implementation of EGP that runs as a user * ** process under the Berkeley Unix 4.2 operating system run on a VAX computer. Some related issues concerning local autonomous system configurations are also discussed. The EGP implementation is experimental and is not a part of Unix 4.2 BSD. It is anticipated that Berkeley will incorporate a version of EGP in the future. The program is written in C. The EGP part is based on the C-Gateway code written by Liza Martin at MIT and the route management part is based on Unix 4.2 BSD route management daemon, "routed". The EGP functions are consistent with the specification of [Mills 84a] except where noted. A knowledge of EGP as described in [Seamonson & Rosen 84; Mills 84a] is assumed. This chapter discusses the motivation for the project, Chapter 2 describes the gateway design, Chapter 3 is on testing, Chapter 4 suggests some enhancements and Chapter 5 discusses topology issues. Further information about running the EGP program and describing the software is being published in an ISI Research Report ISI/RR-84-145 [Kirton 84]. Requests for documentation and copies of the EGP program should be sent to Joyce Reynolds (JKReynolds@USC-ISIF.ARPA). Software support is not provided. 1.1 Motivation for Development With the introduction of EGP, the internet gateways will be divided into a "core" autonomous system (AS) of gateways maintained by Bolt, Beranek and Newman (BBN) and many "stub" AS's that are maintained by different organizations and have at least one network in common with a core AS gateway. The core AS will act as a hub for passing on routing information between _______________ * Unix is a trade mark of AT&T ** VAX is a trade mark of Digital Equipment Corporation RFC 911 2 different stub AS's so that it will only be necessary for stub AS's to conduct EGP with a core gateway. Further detail is given in [Rosen 82]. At the time of this project there were 28 "non-routing" gateways in the internet. Non-routing gateways did not exchange routing information but required static entries in the core gateway routing tables. Since August 1, 1984 these static entries have been eliminated and previously non-routing gateways are required to communicate this information to the core gateways dynamically via EGP [Postel 84]. At the USC Information Sciences Institute (ISI) there was a non-routing gateway to the University of California at Irvine network (UCI-ICS). With the elimination of non-routing gateways from the core gateway tables it is necessary to inform the core ISI gateway of the route to UCI-ICS using EGP. Also, we would like a backup gateway between ISI-NET and the ARPANET in case the core ISI gateway is down. Such, a gateway would need to convey routing information via EGP. Details of the ISI network configuration are discussed in Section 5.2. Of the 28 non-routing gateways 23 were implemented by Unix systems, including ISI's. Also, ISI's proposed backup gateway was a Unix system. Thus there was a local and general need for an EGP implementation to run under Unix. The current version of Unix that included Department of Defense (DoD) protocols was Berkeley Unix 4.2 so this was selected. 1.2 Overview of EGP This report assumes a knowledge of EGP, however a brief overview is given here for completeness. For further details refer to [Rosen 82] for the background to EGP, [Seamonson & Rosen 84] for an informal description, and [Mills 84a] for a more formal specification and implementation details. EGP is generally conducted between gateways in different AS's that share a common network, that is, neighbor gateways. EGP consists of three procedures, neighbor acquisition, neighbor reachability and network reachability. Neighbor acquisition is a two way handshake in which gateways agree to conduct EGP by exchanging Request and Confirm messages which include the minimum Hello and Poll intervals. Acquisition is terminated by exchanging Cease and Cease-ack messages. Neighbor reachability is a periodic exchange of Hello commands and I-H-U (I heard you) responses to ensure that each gateway is up. Currently a 30 second minimum interval is used across ARPANET. Only one gateway need send commands as the other can use them to determine reachability. A gateway sending reachability commands is said to be in the active mode, while a gateway that just responds is in the passive mode. RFC 911 3 Network reachability is determined by periodically sending Poll commands and receiving Update responses which indicate the networks reachable via one or more gateways on the shared network. Currently 2 minute minimum interval is used across ARPANET. RFC 911 4 2. GATEWAY DESIGN EGP is a polling protocol with loose timing constraints. Thus the only gateway function requiring good performance is packet forwarding. Unix 4.2 already has packet forwarding built into the kernel where best performance can be achieved. At the time of writing Unix 4.2 did not send ICMP (Internet Control Message Protocol) redirect messages for misrouted packets. This is a requirement of internet gateways and will later be added by Berkeley. The EGP and route update functions are implemented as a user process. This facilitates development and distribution as only minor changes need to be made to the Unix kernel. This is a similar approach to the Unix route distribution program "routed" [Berkeley 83] which is based on the Xerox NS Routing Information Protocol [Xerox 81]. 2.1 Routing Tables A route consists of a destination network number, the address of the next gateway to use on a directly connected network, and a metric giving the distance in gateway hops to the destination network. There are two sets of routing tables, the kernel tables (used for packet forwarding) and the EGP process tables. The kernel has separate tables for host and network destinations. The EGP process only maintains the network routing tables. The EGP tables are updated when EGP Update messages are received. When a route is changed the kernel network tables are updated via the SIOCADDRT and SIOCDELRT ioctl system calls. At initialization the kernel network routing tables are read via the kernel memory image file, /dev/kmem, and copied into the EGP tables for consistency. This EGP implementation is designed to run on a gateway that is also a host. Because of the relatively slow polling to obtain route updates it is possible that the host may receive notification of routing changes via ICMP redirects before the EGP process is notified via EGP. Redirects update the kernel tables directly. The EGP process listens for redirect messages on a raw socket and updates its routing tables to keep them consistent with the kernel. The EGP process routing tables are maintained as two separate tables, one for exterior routes (via different AS gateways) and one for interior routes (via the gateways of this AS). The exterior routing table is updated by EGP Update messages. The interior routing table is currently static and is set at initialization time. It includes all directly attached nets, determined by the SIOCGIFCONF ioctl system call and any interior non-routing gateways read from the EGP initialization file, EGPINITFILE. The interior routing table could in future be updated dynamically by an Interior Gateway Protocol (IGP). Maintaining separate tables for exterior and interior routing facilitates the preparation of outgoing Update messages which only contain interior routing information [Mills 84b]. It also permits alternative external routes to the internal routes to be saved as a backup in case an interior route fails. Alternate routes are flagged, RTS_NOTINSTALL, to indicate that the kernel RFC 911 5 routes should not be updated. In the current implementation alternate routes are not used. 2.1.1 Incoming Updates EGP Updates are used to update the exterior routing table if one of the following is satisfied: - No routing table entry exists for the destination network and the metric indicates the route is reachable (< 255). - The advised gateway is the same as the current route. - The advised distance metric is less than the current metric. - The current route is older (plus a margin) than the maximum poll interval for all acquired EGP neighbors. That is, the route was omitted from the last Update. If any exterior route entry, except the default route, is not updated by EGP within 4 minutes or 3 times the maximum poll interval, whichever is the greater, it is deleted. If there is more than one acquired EGP neighbor, the Update messages received from each are treated the same way in the order they are received. In the worst case, when a route is changed to a longer route and the old route is not first notified as unreachable, it could take two poll intervals to update a route. With the current poll interval this could be 4 minutes. Under Unix 4.2 BSD TCP connections (Transmission Control Protocol) are closed automatically after they are idle for 6 minutes. So this worst case will not result in the automatic closure of TCP connections. 2.1.2 Outgoing Updates Outgoing Updates include the direct and static networks from the interior routing table, except for the network shared with the EGP neighbor. The networks that are allowed to be advised in Updates may be specified at initialization in EGPINITFILE. This allows particular routes to be excluded from exterior updates in cases where routing loops could be a problem. Another case where this option is necessary, is when there is a non-routing gateway belonging to a different AS which has not implemented EGP yet. Its routes may need to be included in the kernel routing table but they are not allowed to be advised in outgoing updates. If the interior routing table includes other interior gateways on the network shared with the EGP neighbor they are include in Updates as the appropriate RFC 911 6 first hop to their attached networks. The distance to networks is set as in the interior routing table except if the route is marked down in which case the distance is set to 255. At present routes are only marked down if the outgoing interface is down. The state of all interfaces is checked prior to preparing each outgoing Update using the SIOCGIFFLAGS ioctl system call. Unsolicited Updates are not sent. 2.2 Neighbor Acquisition EGPINITFILE lists the addresses of trusted EGP neighbor gateways, which are read at initialization. These will usually be core gateways as only core gateways provide full internet routing information. At the time of writing there were three core gateways on ARPANET which support EGP, CSS-GATEWAY, ISI-GATEWAY and PURDUE-CS-GW, and two on MILNET, BBN-MINET-A-GW and AERONET-GW. EGPINITFILE also includes the maximum number of these gateways that should be acquired at any one time. This is usually expected to be just one. If this gateway is declared down another gateway on the list will then be acquired automatically in sufficient time to ensure that the current routes are not timed out. The gateway will only accept acquisitions from neighbors on the trusted list and will not accept them if it already has acquired its maximum quota. This prevents Updates being accepted from possibly unreliable sources. The ability to acquire core gateways that are not on the trusted list but have been learned of indirectly via Update messages is not included because not all core gateways run EGP. New acquisition Requests are sent to neighbors in the order they appear in EGPINITFILE. No more new Requests than the maximum number of neighbors yet to be acquired are sent at once. Any number of outstanding Requests are retransmitted at 32 second intervals up to 5 retransmissions each at which time the acquisition retransmission interval is increased to 4 minutes. Once the maximum number of neighbors has been acquired, unacquired neighbors with outstanding Requests are sent Ceases. This approach provides a compromise between fast response when neighbors do not initially respond and a desire to minimize the chance that a neighbor may be Ceased after it has sent a Confirm but before it has been received. If the specified maximum number of neighbors cannot be acquired, Requests are retransmitted indefinitely to all unacquired neighbors. 2.3 Hello and Poll Intervals The Request and Confirm messages include minimum values for Hello and Poll intervals. The advised minimums by this and the core gateways are currently 30 and 120 seconds respectively. RFC 911 7 The received intervals are checked against upper bounds to guard against nonsense values. The upper bounds are currently set at 120 and 480 seconds respectively. If, they are exceeded the particular neighbor is considered bad and not sent further Requests for one hour. This allows the situation to be corrected at the other gateway and normal operation to automatically resume from this gateway without an excess of unnecessary network traffic. The actual Hello and Poll intervals are chosen by first selecting the maximum of the intervals advised by this gateway and its peer. A 2 second margin is then added to the Hello interval to take account of possible network delay variations and the Poll interval is increased to the next integer ratio of the Hello interval. This results in 32 second Hello and 128 second Poll intervals. If an Update is not received in response to a Poll, at most one repoll (same sequence number) is sent instead of the next scheduled Hello. 2.4 Neighbor Cease If the EGP process is sent a SIGTERM signal via the Kill command, all acquired neighbors are sent Cease(going down) commands. Ceases are retransmitted at the hello interval at most 3 times. Once all have either responded with Cease-acks or been sent three retransmitted Ceases the process is terminated. 2.5 Neighbor Reachability Only active reachability determination is implemented. It is done as recommended in [Mills 84a] with a minor variation noted below. A shift register of responses is maintained. For each Poll or Hello command sent a zero is shifted into the shift register. If a response (I-H-U, Update or Error) is received with the correct sequence number the zero is replaced by a one. Before each new command is sent the reachability is determined by examining the last four entries of the shift register. If the neighbor is reachable and <= 1 response was received the neighbor is considered unreachable. If the neighbor is considered unreachable and >= 3 responses were received it is now considered reachable. A neighbor is considered reachable immediately after acquisition so that the first poll received from a core gateway (once it considers this gateway reachable) will be responded to with an Update. Polls are not sent unless a neighbor is considered reachable and it has not advised that it considers this gateway unreachable in its last Hello, I-H-U or Poll message. This prevents the first Poll being discarded after a down/up transition. This is important as the Polls are used for reachability determination. Following acquisition at least one message must be received before the first Poll is sent. This is to determine that the peer does not consider this gateway down. This usually requires at least one Hello to be sent prior to the first poll. The discussion of this paragraph differs from [Mills 84a] which recommends that a peer be considered down following acquisition and Polls may be sent as soon as the peer is considered up. This is the only significant departure from the RFC 911 8 recommendations in [Mills 84a]. Polls received by peers that are considered unreachable are sent an Error response which allows their reachability determination to progress correctly. This action is an option within [Mills 84a]. When a neighbor becomes unreachable all routes using it as a gateway are deleted from the routing table. If there are known unacquired neighbors the unreachable gateway is ceased and an attempt is made to acquire a new neighbor. If all known neighbors are acquired the reachability determination is continued for 30 minutes ([Mills 84a] suggests 60 minutes) after which time the unreachable neighbor is ceased and reacquisition attempted every 4 minutes. This is aimed at reducing unnecessary network traffic. If valid Update responses are not received for three successive polls the neighbor is ceased and an alternative acquired or reacquisition is attempted in 4 minutes. This provision is provided in case erroneous Update data formats are being sent by the neighbor. This situation did occur on one occasion during testing. 2.6 Sequence Numbers Sequence numbers are managed as recommended in [Mills 84a]. Single send and receive sequence numbers are maintained for each neighbor. The send sequence number is initialized to zero and is incremented before each new Poll (not repoll) is sent and at no other time. The send sequence number is used in all commands. The receive sequence number is maintained by copying the sequence number of the last Request, Hello, or Poll command received from a neighbor. This sequence number is used in outgoing Updates. All responses (including Error responses) return the sequence number of the message just received. 2.7 Treatment of Excess Commands If more than 20 commands are received from a neighbor in any 8 minute period the neighbor is considered bad, Ceased and reacquisition prevented for one hour. At most one repoll (same sequence number) received before the poll interval has expired (less a 4 second margin for network delay variability) is responded to with an Update, others are sent an Error response. When an Update is sent in response to a repoll the unsolicited bit is not set, which differs from the recommendation in [Mills 84a]. 2.8 Inappropriate Messages If a Confirm, Hello, I-H-U, Poll or Update is received from any gateway (known or unknown) that is in the unacquired state, synchronization has probably been lost for some reason. A Cease(protocol violation) message is sent to try and reduce unnecessary network traffic. This action is an option in [Mills 84a]. RFC 911 9 2.9 Default Gateway A default gateway may be specified in EGPINITFILE. The default route (net 0 in Unix 4.2 BSD) is used by the kernel packet forwarder if there is no specific route for the destination network. This provides a final level of backup if all known EGP neighbors are unreachable. This is especially useful if there is only one available EGP neighbor, as in the ISI case, Section 5.2.2. The default route is installed at initialization and deleted after a valid EGP Update message is received. It is reinstalled if all known neighbors are acquired but none are reachable, if routes time out while there are no EGP neighbors that are acquired and reachable, and prior to process termination. It is deleted after a valid EGP Update message is received because the default gateway will not know any more routing information than learned via EGP. If it were not deleted, all traffic to unreachable nets would be sent to the default gateway under Unix 4.2 forwarding strategy. The default gateway should normally be set to a full-routing core gateway other than the known EGP neighbor gateways to give another backup in case all of the EGP gateways are down simultaneously. RFC 911 10 3. TESTING A few interesting cases that occurred during testing are briefly described. The use of sequence numbers was interpreted differently by different implementers. Consequently some implementations rejected messages as having incorrect sequence numbers, resulting in the peer gateway being declared down. The main problem was that the specification was solely in narrative form which is prone to inconsistencies, ambiguities and incompleteness. The more formal specification of [Mills 84a] has eliminated these ambiguities. When testing the response to packets addressed to a neighbor gateway's interface that was not on the shared net a loop resulted as both gateways repeatedly exchanged error messages indicating an invalid interface. The problem was that both gateways were sending Error responses after checking the addresses but before the EGP message type was checked. This was rectified by not sending an Error response unless it was certain that the message was not itself an Error response. On one occasion a core gateway had some form of data error in the Update messages which caused them to be rejected even though reachability was being satisfactorily conducted. This resulted in all routes being timed out. The solution was to count the number of successive Polls that do not result in valid Updates being received and if this number reaches 3 to Cease EGP and attempt to acquire an alternative gateway. Another interesting idiosyncrasy, reported by Mike Karels at Berkeley, results from having multiple gateways between MILNET and ARPANET. Each ARPANET host has an assigned gateway to use for access to MILNET. In cases where the EGP gateway is a host as well as a gateway, the EGP Update messages may indicate a different MILNET/ARPANET gateway from the assigned one. When the host/gateway originates a packet that is routed via the EGP reported gateway, it will receive a redirect to its assigned gateway. Thus the MILNET gateway can keep being switched between the gateway reported by EGP and the assigned gateway. A similar thing occurs when using routes to other nets reached via MILNET/ARPANET gateways. RFC 911 11 4. FUTURE ENHANCEMENTS 4.1 Multiple Autonomous Systems The present method of acquiring a maximum number of EGP neighbors from a trusted list implies that all the neighbors are in the same AS. The intention is that they all be members of the core AS. When updating the routing tables, Updates are treated independently with no distinction made as to whether the advised routes are internal or external to the peer's AS. Also, routing metrics are compared without reference to the AS of the source. If EGP is to be conducted with additional AS's beside the core AS, all neighbors on the list would need to be acquired in order to ensure that gateways from both AS's were always acquired. This results in an unnecessary excess of EGP traffic if redundant neighbors are acquired for reliability. A more desirable approach would be to have separate lists of trusted EGP gateways and the maximum number to be acquire, for each AS. Routing entries would need to have the source AS added so that preference could be given to information received from the owning AS (see Section 5.1.2) 4.2 Interface Monitoring At present, interface status is only checked immediately prior to the sending of an Update in response to a Poll. The interface status could be monitored more regularly and an unsolicited Update sent when a change is detected. This is one area where the slow response of EGP polling could be improved. This is of particular interest to networks that may be connected by dial-in lines. When such a network dials in, its associated interface will be marked as up but it will not be able to receive packets until the change has been propagated by EGP. This is one case where the unsolicited Update message would help, but there is still the delay for other non-core gateways to poll core EGP gateways for the new routing information. This was one case where it was initially thought that a kernel EGP implementation might help. But the kernel does not presently pass interface status changes by interrupts so a new facility would need to be incorporated. If this was done it may be just as easy to provide a user level signal when an interface status changes. 4.3 Network Level Status Information At present, network level status reports, such as IMP Destination Unreachable messages, are not used to detect changes in the reachability of EGP neighbors or other neighbor gateways. This information should be used to improve the response time to changes. RFC 911 12 4.4 Interior Gateway Protocol Interface At present any routing information that is interior to the AS is static and read from the initialization file. The internal route management functions have been written so that it should be reasonably easy to interface an IGP for dynamic interior route updates. This is facilitated by the separation of the exterior and interior routing tables. The outgoing EGP Updates will be correctly prepared from the interior routing table by rt_NRnets() whether or not static or dynamic interior routing is done. Functions are also provided for looking up, adding, changing and deleting internal routes, i.e. rt_int_lookup(), rt_add(), rt_change() and rt_delete() respectively. The interaction of an IGP with the current data structures basically involves three functions: updating the interior routing table using a function similar to rt_NRupdate(), preparing outgoing interior updates similarly to rt_NRnets(), and timing out interior routes similarly to rt_time(). RFC 911 13 5. TOPOLOGY ISSUES 5.1 Topology Restrictions and Routing Loops 5.1.1 Background EGP is not a routing algorithm. it merely enables exterior neighbors to exchange routing information which is likely to to be needed by a routing algorithm. It does not pass sufficient information to prevent routing loops if cycles exist in the topology [Rosen 82]. Routing loops can occur when two gateways think there are alternate routes to reach a third gateway via each other. When the third gateway goes down they end up pointing to each other forming a routing loop. Within the present core system, loops are broken by counting to "infinity" (the internet diameter in gateway hops). This (usually) works satisfactorily because GGP propagates changes fairly quickly as routing updates are sent as soon as changes occur. Also the diameter of the internet is quite small (5) and a universal distance metric, hop count, is used. But this will be changed in the future. With EGP, changes are propagated slowly. Although a single unsolicited NR message can be sent, it won't necessarily be passed straight on to other gateways who must hear about it indirectly. Also, the distance metrics of different AS's are quite independent and hence can't be used to count to infinity. The initial proposal was to prevent routing loops by restricting the topology of AS's to a tree structure so that there are no multiple routes via alternate AS's. Multiple routes within the same AS are allowed as it is the interior routing strategies responsibility to control loops. [Mills 84b] has noted that even with the tree topology restriction, "we must assume that transient loops may form within the core system from time to time and that this information may escape to other systems; however, it would be expected that these loops would not persist for very long and would be broken in a short time within the core system itself. Thus a loop between non-core systems can persist until the first round of Update messages sent to the other systems after all traces of the loop have been purged from the core system or until the reachability information ages out of the tables, whichever occurs first". With the initial simple stub EGP systems the tree topology restriction could be satisfied. But for the long term this does not provide sufficient robustness. [Mills 83] proposed a procedure by which the AS's can dynamically reconfigure themselves such that the topology restriction is always met, without the need for a single "core" AS. One AS would own a shared net and its neighbor AS's would just conduct EGP with the owner. The owner would pass on such information indirectly as the core system does now. If the owning AS is defined to be closest to the root of the tree topology, any haphazard interconnection can RFC 911 14 form itself into an appropriate tree structured routing topology. By routing topology I mean the topology as advised in routing updates. There may well be other physical connections but if they are not advised they will not be used for routing. Each AS can conduct EGP with at most one AS that owns one of its shared nets. Any AS that is not conducting EGP over any net owned by another AS is the root of a subtree. It may conduct EGP with just one other AS that owns one of its shared nets. This "attachment" combines the two subtrees into a single subtree such that the overall topology is still a tree. Topology violations can be determined because two different AS's will report that they can reach the same net. With such a dynamic tree, there may be preferred and backup links. In such cases it is necessary to monitor the failed link so that routing can be changed back to the preferred link when service is restored. Another aspect to consider is the possibility of detecting routing loops and then breaking them. Expiration of the packet time-to-live (TTL) could be used to do this. If such a loop is suspected a diagnostic packet, such as ICMP echo, could be sent over the suspect route to confirm whether it is a loop. If a loop is detected a special routing packet could be sent over the route that instructs each gateway to delete the route after forwarding the packet on. The acceptance of new routing information may need to be delayed for a hold down period. This approach would require sensible selection of the initial TTL. But this is not done by many hosts. 5.1.2 Current Policy Considering the general trend to increased network interconnection and the availability of alternative long-haul networks such as ARPANET, WBNET (wideband satellite network), and public data networks the tree topology restriction is generally unacceptable. A less restrictive topology is currently recommended. The following is taken from [Mills 84b]. EGP topological model: - An autonomous system consists of a set of gateways connected by networks. Each gateway in the system must be reachable from every other gateway in its system by paths including only gateways in that system. - A gateway in a system may run EGP with a gateway in any other system as long as the path over which EGP itself is run does not include a gateway in a third system. - The "core system" is distinguished from the others by the fact that only it is allowed to distribute reachability information about systems other than itself. - At least one gateway in every system must have a net in common with a gateway in the core system. RFC 911 15 - There are no topological or connectivity restrictions other than those implied above. A gateway will use information derived from its configuration (directly connected nets), the IGP of its system, called S in the following, (interior nets) and EGP (interior and exterior nets of neighboring systems) to construct its routing tables. If conflicts with respect to a particular net N occur, they will be resolved as follows: - If N is directly connected to the gateway, all IGP and EGP reports about N are disregarded. - If N is reported by IGP as interior to S and by EGP as either interior or exterior to another system, the IGP report takes precedence. - If N is reported by EGP as interior to one system and exterior to another, the interior report takes precedence. - If N is reported as interior by two or more gateways of the same system using EGP, the reports specifying the smallest hop count take precedence. - In all other cases the latest received report takes precedence. Old information will be aged from the tables. The interim model provides an acceptable degree of self-organization. Transient routing loops can occur between systems, but these are eventually broken by old reachability information being aged out of the tables. Given the fact that transient loops can occur due to temporary core-system loops, the additional loops that might occur in the case of local nets homed to multiple systems does not seem to increase the risk significantly. 5.2 Present ISI Configuration A simplified version of the ISI network configuration is shown in Figure 5-1. ISI-Hobgoblin can provide a backup gateway function to the core ISI-Gateway between ARPANET and ISI-NET. ISI-Hobgoblin is a VAX 11/750 which runs Berkeley Unix 4.2. The EGP implementation described in this report is run on ISI-Hobgoblin. ISI-Troll is part of a split gateway to the University of California at Irvine network (UCI-ICS). The complete logical gateway consists of ISI-Troll, the 9600 baud link and UCI-750A [Rose 84]. ISI-Troll runs Berkeley Unix 4.1a and hence cannot run the EGP program. It is therefore a non-routing gateway. The existence of UCI-ICS net must be advised to the core AS by ISI-Hobgoblin. This can be done by including an appropriate entry in the EGPINITFILE. Hosts on ISI-NET, including ISI-Troll, have static route entries indicating ISI-Gateway as the first hop for all networks other than UCI-ICS and ISI-NET. RFC 911 16 ------------------------------------------------- / \ / ARPANET \ \ 10 / \ / ------------------------------------------------- | | | | | | | | | +-------------+ +-------------+ +---------------+ | ISI-PNG11 | | | | | | Arpanet | | ISI-GATEWAY | | ISI-HOBGOBLIN | | Address | | | | Vax 11/750 | | logical | | Core EGP | | Unix 4.2 | | multiplexer | | | | | +-------------+ +-------------+ +---------------+ | | | | | | | | | --------------- ---------------------------- / \ / \ / 3 Mb/s Ethernet \ / ISI-NET \ \ net 10 / \ 128.9 / \ / \ / --------------- ---------------------------- | | | +--------------+ | ISI-TROLL | | Vax 11/750 | | Unix 4.1a | | Non-routing | | | | | | 9600 | ISI-TROLL, UCI-750A | | baud | and the link form a | | link | single logical gateway | | | | UCI-750A | | Vax 11/750 | | Unix 4.2 | +--------------+ | | | ---------------------- / \ / UCI-ICS \ \ 192.5.19 / \ / ---------------------- Figure 5-1: Simplified ISI Network Configuration RFC 911 17 EGP can either be conducted with ISI-Gateway across ARPANET or ISI-NET. 5.2.1 EGP Across ARPANET ISI-Hobgoblin will advise ISI-Gateway across ARPANET, and hence the core system, that it can reach ISI-NET and UCI-ICS. Packets from AS's exterior to ISI and destined for UCI-ICS will be routed via ISI-Gateway, ISI-Hobgoblin and ISI-Troll. The extra hop via ISI-Gateway (or other core EGP gateway) is because the core gateways do not currently pass on indirect-neighbor exterior gateway addresses in their IGP messages (Gateway-to-Gateway Protocol). Packets originating from UCI-ICS destined for exterior AS's will be routed via ISI-Troll and ISI-Gateway. Thus the incoming and out going packet routes are different. Packets originating from ISI-Hobgoblin as a host and destined for exterior AS's will be routed via the appropriate gateway on ARPANET. UCI-ICS can only communicate with exterior AS's if ISI-Troll, ISI-Hobgoblin and ISI-Gateway are all up. The dependence on ISI-Gateway could be eliminated if ISI-Troll routed packets via ISI-Hobgoblin rather than ISI-Gateway. However, as ISI-Hobgoblin is primarily a host and not a gateway it is preferable that ISI-Gateway route packets when possible. ISI-Hobgoblin can provide a back-up gateway function to ISI-Gateway as it can automatically switch to an alternative core EGP peer if ISI-Gateway goes down. Even though ISI-Hobgoblin normally advises the core system that it can reach ISI-NET the core uses its own internal route via ISI-Gateway in preference. For hosts on ISI-NET to correctly route outgoing packets they need their static gateway entries changed from ISI-Gateway to ISI-Hobgoblin. At present this would have to be done manually. This would only be appropriate if ISI-Gateway was going to be down for an extended period. 5.2.2 EGP Across ISI-NET ISI-Hobgoblin will advise ISI-Gateway across ISI-NET that its indirect neighbor, ISI-Troll, can reach UCI-ICS net. All exterior packet routing for UCI-ICS will be via ISI-Gateway in both directions with no hops via ISI-Hobgoblin. Packets originating from ISI-Hobgoblin as a host and destined for exterior AS's will be routed via ISI-Gateway, rather than the ARPANET interface, in both directions, thus taking an additional hop. UCI-ICS can only communicate with exterior AS's if ISI-Troll and ISI-Gateway are up and ISI-Hobgoblin has advised ISI-Gateway of the UCI-ICS route. If ISI-Hobgoblin goes down, communication will still be possible because ISI-gateway (and other core gateways) do not time out routes to indirect RFC 911 18 neighbors. If ISI-Gateway then goes down, it will need to be readvised by ISI-Hobgoblin of the UCI-ICS route, when it comes up. Conducting EGP over ISI-NET rather than ARPANET should provide more reliable service for UCI-ICS for the following reasons: ISI-Gateway is specifically designed as a gateway, it is expected to be up more than ISI-Hobgoblin, it is desirable to eliminate extra routing hops where possible and, the exterior routing information will persist after ISI-hobgoblin goes down. If ISI-Hobgoblin is to be used in its back-up mode, EGP could be restarted across ARPANET after the new gateway routes are manually installed in the hosts. Therefore, EGP across ISI-NET was selected as the preferred mode of operation. 5.2.3 Potential Routing Loop Because both ISI-Gateway and ISI-Hobgoblin provide routes between ARPANET and ISI-NET there is a potential routing loop. This topology in fact violates the original tree structure restriction. Provided ISI-Hobgoblin does not conduct EGP simultaneously with ISI-Gateway over ISI-NET and ARPANET, the gateways will only ever know about the alternative route from the shared EGP network and not from the other network. Thus a loop cannot occur. For instance, if EGP is conducted over ISI-NET, both ISI-Gateway and ISI-Hobgoblin will know about the alternative routes via each other to ARPANET from ISI-NET, but they will not know about the gateway addresses on ARPANET to be able to access ISI-NET from ARPANET. Thus they have insufficient routing data to be able to route packets in a loop between themselves. 5.3 Possible Future Configuration 5.3.1 Gateway to UCI-ICS An improvement in the reliability and performance of the service offered to UCI-ICS can be achieved by moving the UCI-ICS interface from ISI-Troll to ISI-Hobgoblin. Reliability will improve because the connection will only require ISI-Hobgoblin and its ARPANET interface to be up and performance will improve because the extra gateway hop will be eliminated. This will also allow EGP to be conducted across ARPANET giving access to the alternative core gateways running EGP. This will increase the chances of being able to reliably acquire an EGP neighbor at all times. It will also eliminate the extra hop via ISI-Gateway for packets originating from ISI-Hobgoblin, as a host, and destined for exterior networks. This configuration change will be made at sometime in the future. It was not done initially because ISI-Hobgoblin was experimental and down more often than ISI-Troll. RFC 911 19 5.3.2 Dynamic Switch to Backup Gateway It was noted in Section 5.2.1 that ISI-Hobgoblin can provide a backup gateway function to ISI-Gateway between ARPANET and ISI-NET. Such backup gateways could become a common approach to providing increased reliability. At present the change over to the backup gateway requires the new gateway route to be manually entered for hosts on ISI-NET. This section describes a possible method for achieving this changeover dynamically when the primary gateway goes down. The aim is to be able to detect when the primary gateway is down and have all hosts on the local network change to the backup gateway with a minimum amount of additional network traffic. The hosts should revert back to the primary gateway when it comes up again. The proposed method is for only the backup gateway to monitor the primary gateway status and for it to notify all hosts of the new gateway address when there is a change. 5.3.2.1 Usual Operation The backup gateway runs a process which sends reachability-probe messages, such as ICMP echoes, to the primary gateway every 30 seconds and uses the responses to determine reachability as for EGP. If the primary gateway goes down a "gateway-address message" indicating the backup gateway address is broadcast (or preferably multicast) to all hosts. When the primary gateway comes up another gateway message indicating the primary gateway address is broadcast. These broadcasts should be done four times at 30 second intervals to avoid the need for acknowledgements and knowledge of host addresses. Each host would run a process that listens for gateway-address messages. If a different gateway is advised it changes the default gateway entry to the new address. 5.3.2.2 Host Initialization When a host comes up the primary gateway could be down so it needs to be able to determine that it should use the backup gateway. The host could read the address of the primary and backup gateways from a static initialization file. It would then set its default gateway as the primary gateway and send a "gateway-request message" to the backup gateway requesting the current gateway address. The backup gateway would respond with a gateway-address message. If no response is received the gateway-request should be retransmitted three times at 30 second intervals. If no response is received the backup gateway can be assumed down and the primary gateway retained as the default. Whenever the backup gateway comes up it broadcasts a gateway-address message. Alternatively, a broadcast (or multicast) gateway-request message could be RFC 911 20 defined to which only gateways would respond. The backup gateway-address message needs to indicate that it is the backup gateway so that future requests need not be broadcast. Again, three retransmissions should be used. But the primary gateway also needs to broadcast its address whenever it comes up. 5.3.2.3 When Both the Primary and Backup are Down If the primary gateway is down and the backup knows it is going down, it should broadcast gateway-address messages indicating the primary gateway in case the primary gateway comes up first. But the backup could go down without warning and the primary come up before it. If the primary gateway broadcasts a gateway-address message when it comes up there is no problem. Otherwise, while hosts are using the backup gateway they should send a gateway-request message every 10 minutes. If no response is received it should be retransmitted 3 times at 30 second intervals and if still no response the backup assumed down and the primary gateway reverted to. Thus the only time hosts need to send messages periodically is when the primary gateway does not send gateway-address messages on coming up and the backup gateway is being used. In some cases, such as at ISI, the primary gateway is managed by a different organization and experimental features cannot be conveniently added. 5.3.2.4 Unix 4.2 BSD One difficulty with the above is that there is no standard method of specifying internet broadcast or multicast addresses. Multicast addressing is preferable as only those participating need process the message (interfaces with hardware multicast detection are available). In the case of Unix 4.2 BSD an internet address with zero local address is assumed for the internet broadcast address. However, the general Internet Addressing policy is to use an all ones value to indicate a broadcast function. On Unix 4.2 BSD systems, both the gateway and host processes could be run at the user level so that kernel modifications are not required. A User Datagram Protocol (UDP) socket could be reserved for host-backup-gateway communication. Super user access to raw sockets for sending and receiving ICMP Echo messages requires a minor modification to the internet-family protocol switch table. RFC 911 21 6. ACKNOWLEDGEMENT I acknowledge with thanks the many people who have helped me with this project, but in particular, Dave Mills, who suggested the project, Jon Postel for discussion and encouragement, Liza Martin for providing the initial EGP code, Berkeley for providing the "routed" code, Mike Brescia for assistance with testing, Telecom Australia for funding me, and ISI for providing facilities. RFC 911 22 7. REFERENCES [Berkeley 83] "Unix Programmer's Manual", Vol. 1, 4.2 Berkeley Software Distribution, University of California, Berkeley. [Kirton 84] Kirton, P.A., "EGP Gateway Under Berkeley Unix 4.2", University of Southern California, Information Sciences Institute, Research Report ISI/RR-84-145, to be published. [Mills 83] Mills, D.L., "EGP Models and Self-Organizing Systems" Message to EGP-PEOPLE@BBN-UNIX, Nov. 1983. [Mills 84a] Mills, D.L., "Exterior Gateway Protocol Formal Specification", Network Information Center RFC 904, April 1984. [Mills 84b] Mills, D.L., "Revised EGP Model Clarified and Discussed", Message to EGP-PEOPLE@BBN-UNIX, May 1984. [Postel 84] Postel, J., "Exterior Gateway Protocol Implementation Schedule" Network Information Center RFC 890, Feb. 1984. [Rose 84] Rose, M.T., "Low-Tech Connection into the ARPA-Internet: The Raw-Packet Split Gateway", Department of Information and Computer Science, University of California, Irvine, Technical Report 216, Feb. 1984. [Rosen 82] Rosen, E.C., "Exterior Gateway Protocol", Network Information Center RFC 827, Oct. 1982. [Seamonson & Rosen 84] Seamonson, L.J. and Rosen, E.C., "Stub Exterior Gateway Protocol", Network Information Center RFC 888, Jan. 84. [Xerox 81] "Internet Transport Protocols", Xerox System Integration Standard XSIS 028112, Dec. 1981.