Africa’s growing importance is reflected in the intensifying competition with China and other countries for both access to African resources and influence in this region. A more comprehensive U.S. policy toward Africa is needed, the report states, and it lays out recommendations for policymakers to craft that policy.
I haven’t read this yet but after reading the above synopsis it seemed to speak to and refute the “Common” notion of Africa as a poor nation in need of the world’s pity and benevolence…
Take a look….
More Than Humanitarianism
PD
The Washington Institute for Near East Policy
Politico.com: Politics '08
Monday, October 22, 2007
The Secret History of the Impending War with Iran That the White House Doesn't Want You to Know
Two former high-ranking policy experts from the Bush Administration say the U.S. has been gearing up for a war with Iran for years, despite claiming otherwise. It'll be Iraq all over again.
In the years after 9/11, Flynt Leverett and Hillary Mann worked at the highest levels of the Bush administration as Middle East policy experts for the National Security Council. Mann conducted secret negotiations with Iran. Leverett traveled with Colin Powell and advised Condoleezza Rice. They each played crucial roles in formulating policy for the region leading up to the war in Iraq. But when they left the White House, they left with a growing sense of alarm -- not only was the Bush administration headed straight for war with Iran, it had been set on this course for years. That was what people didn't realize. It was just like Iraq, when the White House was so eager for war it couldn't wait for the UN inspectors to leave. The steps have been many and steady and all in the same direction. And now things are getting much worse. We are getting closer and closer to the tripline, they say.
"The hard-liners are upping the pressure on the State Department," says Leverett. "They're basically saying, 'You've been trying to engage Iran for more than a year now and what do you have to show for it? They keep building more centrifuges, they're sending this IED stuff over into Iraq that's killing American soldiers, the human-rights internal political situation has gotten more repressive -- what the hell do you have to show for this engagement strategy?' "
But the engagement strategy was never serious and was designed to fail, they say. Over the last year, Rice has begun saying she would talk to "anybody, anywhere, anytime," but not to the Iranians unless they stopped enriching uranium first. That's not a serious approach to diplomacy, Mann says. Diplomacy is about talking to your enemies. That's how wars are averted. You work up to the big things. And when U.S. ambassador to Iraq Ryan Crocker had his much-publicized meeting with his Iranian counterpart in Baghdad this spring, he didn't even have permission from the White House to schedule a second meeting.
The most ominous new development is the Bush administration's push to name the Iranian Revolutionary Guards a terrorist organization.
"The U.S. has designated any number of states over the years as state sponsors of terrorism," says Leverett. "But here for the first time the U.S. is saying that part of a government is itself a terrorist organization."
This is what Leverett and Mann fear will happen: The diplomatic effort in the United Nations will fail when it becomes clear that Russia's and China's geopolitical ambitions will not accommodate the inconvenience of energy sanctions against Iran. Without any meaningful incentive from the U.S. to be friendly, Iran will keep meddling in Iraq and installing nuclear centrifuges. This will trigger a response from the hard-liners in the White House, who feel that it is their moral duty to deal with Iran before the Democrats take over American foreign policy. "If you get all those elements coming together, say in the first half of '08," says Leverett, "what is this president going to do? I think there is a serious risk he would decide to order an attack on the Iranian nuclear installations and probably a wider target zone."
This would result in a dramatic increase in attacks on U.S. forces in Iraq, attacks by proxy forces like Hezbollah, and an unknown reaction from the wobbly states of Afghanistan and Pakistan, where millions admire Iran's resistance to the Great Satan. "As disastrous as Iraq has been," says Mann, "an attack on Iran could engulf America in a war with the entire Muslim world."
Mann and Leverett believe that none of this had to be.
Flynt Lawrence Leverett grew up in Fort Worth and went to Texas Christian University. He spent the first nine years of his government career as a CIA analyst specializing in the Middle East. He voted for George Bush in 2000. On the day the assassins of Al Qaeda flew two hijacked airplanes into the World Trade Center, Colin Powell summoned him to help plan the response. Five months later, Leverett landed a plum post on the National Security Council. When Condoleezza Rice discussed the Middle East with President Bush and Donald Rumsfeld, Leverett was the man standing behind her taking notes and whispering in her ear.
Today, he sits on the back deck of a house tucked into the curve of a leafy suburban street in McLean, Virginia, a forty-nine-year-old white American man wearing khakis and a white dress shirt and wire-rimmed glasses. Mann sits next to him, also wearing khakis. She's thirty-nine but looks much younger, with straight brown hair and a tomboy's open face. The polish on her toenails is pink. If you saw her around McLean, you wouldn't hesitate:
Soccer mom. Classic soccer mom.
But with degrees from Brandeis and Harvard Law and stints at Tel Aviv University and the powerful Israeli lobby known as AIPAC, she has even better right-wing credentials than her husband.
As they talk, eating grapes out of a bowl, lawn mowers hum and birds chirp. The floor is littered with toy trucks and rubber animals left behind by the youngest of their four children. But the tranquillity is misleading. When Mann and Leverett went public with the inside story behind the impending disaster with Iran, the White House dismissed them. Then it imposed prior restraint on them, an extraordinary episode of government censorship. Finally, it threatened them.
Now they are afraid of the White House, and watching what they say. But still, they feel they have to speak out.
Like so many things these days, this story began on the morning of September 11, 2001. On Forty-fifth Street in Manhattan, Mann had just been evacuated from the offices of the U.S. mission to the United Nations and was walking home to her apartment on Thirty-eighth Street -- walking south, toward the giant plume of smoke. When her cell phone rang, she picked it up immediately because her sister worked at the World Trade Center and she was frantic for word. But it wasn't her sister, it was a senior Iranian diplomat. To protect him from reprisals from the Iranian government, she doesn't want to name him, but she describes him as a cultured man in his fifties with salt-and-pepper hair. Since early spring, they had been meeting secretly in a small conference room at the UN.
"Are you all right?" he asked.
Yes, she said, she was fine.
The attack was a terrible tragedy, he said, doubtless the work of Al Qaeda.
"I hope that we can still work together," he said.
That same day, in Washington, on the seventh floor of the State Department building, a security guard opened the door of Leverett's office and told him they were evacuating the building. Leverett was Powell's specialist on terrorist states like Syria and Libya, so he knew the world was about to go through a dramatic change. As he joined the people milling on the sidewalk, his mind was already racing.
Then he got a call summoning him back to Foggy Bottom. At the entrance to a specially fortified office, he showed his badge to the guards and passed into a windowless conference room. There were about a dozen people there, Powell's top foreign-policy planners. Powell told them that their first job was to make plans to capture or kill Osama bin Laden. The second job was to rally allies. That meant detailed strategies for approaching other nations -- in some cases, Powell could make the approach, in others the president would have to make the call. Then Powell left them to work through the night.
At 5:30 a.m. on September 12, they walked the list to the office of the deputy secretary of state, Richard Armitage. Powell took it straight to the White House.
Mann and Leverett didn't know each other then, but they were already traveling down parallel tracks. Months before September 11, Mann had been negotiating with the Iranian diplomat at the UN. After the attacks, the meetings continued, sometimes alone and sometimes with their Russian counterpart sitting in. Soon they traded the conference room for the Delegates' Lounge, an airy two-story bar with ashtrays for all the foreigners who were used to smoking indoors. One day, up on the second floor where the windows overlooked the East River, the diplomat told her that Iran was ready to cooperate unconditionally, a phrase that had seismic diplomatic implications. Unconditional talks are what the U.S. had been demanding as a precondition to any official diplomatic contact between the U.S. and Iran. And it would be the first chance since the Islamic revolution for any kind of rapprochement. "It was revolutionary," Mann says. "It could have changed the world."
A few weeks later, after signing on to Condoleezza Rice's staff as the new Iran expert in the National Security Council, Mann flew to Europe with Ryan Crocker -- then a deputy assistant secretary of state -- to hold talks with a team of Iranian diplomats. Meeting in a light-filled conference room at the old UN building in Geneva, they hammered out plans for Iranian help in the war against the Taliban. The Iranians agreed to provide assistance if any American was shot down near their territory, agreed to let the U.S. send food in through their border, and even agreed to restrain some "really bad Afghanis," like a rabidly anti-American warlord named Gulbuddin Hekmatyar, quietly putting him under house arrest in Tehran. These were significant concessions. At the same time, special envoy James Dobbins was having very public and warm discussions in Bonn with the Iranian deputy foreign minister as they worked together to set up a new government for Afghanistan. And the Iranians seemed eager to help in more tactical ways as well. They had intimate knowledge of Taliban strategic capabilities and they wanted to share it with the Americans.
One day during the U.S. bombing campaign, Mann and her Iranian counterparts were sitting around the wooden conference table speculating about the future Afghani constitution. Suddenly the Iranian who knew so much about intelligence matters started pounding on the table. "Enough of that!" he shouted, unfurling a map of Afghanistan. Here was a place the Americans needed to bomb. And here, and here, he angrily jabbed his finger at the map.
Leverett spent those days in his office at the State Department building, watching the revolution in the Middle East and coming up with plans on how to capture the lightning. Suddenly countries like Syria and Libya and Sudan and Iran were coming forward with offers of help, which raised a vital question -- should they stay on the same enemies list as North Korea and Iraq, or could there be a new slot for "friendly" sponsors of terror?
As a CIA analyst, Leverett had come to the view that Middle Eastern terrorism was more tactical than religious. Syria wanted the Golan Heights back and didn't have the military strength to put up a serious fight against Israel, so it relied on "asymmetrical methods." Accepting this idea meant that nations like Syria weren't locked in a fanatic mind-set, that they could evolve to use new methods, so Leverett told Powell to seize the moment and draw up a "road map" to peace for the problem countries of the Middle East -- expel your terrorist groups and stop trying to develop weapons of mass destruction, and we will take you off the sponsors-of-terrorism list and start a new era of cooperation.
That December, just after the triumph over Afghanistan, Powell took the idea to the White House. The occasion was the regular "deputies meeting" at the Situation Room. Gathered around the table were the deputy secretary of state, the deputy secretary of defense, the deputy director of the CIA, a representative from Vice-President Cheney's office, and also the deputy national security advisor, Stephen Hadley.
Hadley hated the idea. So did the representatives from Rumsfeld and Cheney. They thought that it was a reward for bad behavior, that the sponsors of terrorism should stop just because it's the right thing to do.
After the meeting, Hadley wrote up a brief memo that came to be known as Hadley's Rules:
If a state like Syria or Iran offers specific assistance, we will take it without offering anything in return. We will accept it without strings or promises. We won't try to build on it.
Leverett thought that was simply nutty. To strike postures of moral purity, they were throwing away a chance for real progress. But just a few days later, Condoleezza Rice called him into her office, warming him up with talk of how classical music shaped their childhoods. As he told her about the year he spent studying classical piano at the Liszt Academy in Budapest, Leverett felt a real connection. Then she said she was looking for someone to take the job of senior director of Mideast affairs at the National Security Council, someone who would take a real leadership role on the Palestinian issue. Big changes were coming in 2002.
He repeated his firm belief that the White House had to draw up a road map with real solutions to the division of Jerusalem and the problem of refugees, something with final borders. That was the only remedy to the crisis in the Middle East.
Just after the New Year, Rice called and offered him the job.
The bowl of grapes is empty and the plate of cheese moves to the center of the table. Leverett's teenage son comes in with questions about a teacher. Periodically, Mann interrupts herself. "This is off the record," she says. "This is going to have to be on background."
She's not allowed to talk about confidential documents or intelligence matters, but the topic of her negotiations with the Iranians is especially touchy.
"As far as they're concerned, the whole idea that there were talks is something I shouldn't even be talking about," she says.
All ranks and ranking are out. "They don't want there to be anything about the level of the talks or who was involved."
"They won't even let us say something like 'senior' or 'important,' 'high-ranking,' or 'high-level,' " Leverett says.
But the important thing is that the Iranians agreed to talk unconditionally, Mann says. "They specifically told me time and again that they were doing this because they understood the impact of this attack on the U.S., and they thought that if they helped us unconditionally, that would be the way to change the dynamic for the first time in twenty-five years."
She believed them.
But while Leverett was still moving into the Old Executive Office Building next to the White House, Mann was wrapped up in the crisis over a ship called the Karin A that left Iran loaded with fifty tons of weapons. According to the Israeli navy, which intercepted the Karin A in the Red Sea, it was headed for the PLO. In staff meetings at the White House, Mann argued for caution. The Iranian government probably didn't even know about the arms shipments. It was issuing official denials in the most passionate way, even sending its deputy foreign minister onto Fox News to say "categorically" that "all segments of the Iranian government" had nothing to do with the arms shipment, which meant the "total government, not simply President Khatami's administration."
Bush waited. Three weeks later, it was time for his 2002 State of the Union address. Mann spent the morning in a meeting with Condoleezza Rice and the new president of Afghanistan, Hamid Karzai, who kept asking Rice for an expanded international peacekeeping force. Rice kept saying that the Afghans would have to solve their own problems. Then they went off to join the president's motorcade and Mann headed back to her office to watch the speech on TV.
That was the speech in which Bush linked Iran to Iraq and North Korea with a memorable phrase:
"States like these, and their terrorist allies, constitute an axis of evil, arming to threaten the peace of the world."
The Iranians had been engaging in high-level diplomacy with the American government for more than a year, so the phrase was shocking and profound.
After that, the Iranian diplomats skipped the monthly meeting in Geneva. But they came again in March. And so did Mann. "They said they had put their necks out to talk to us and they were taking big risks with their careers and their families and their lives," Mann says.
The secret negotiations with Iran continued, every month for another year.
Leverett plunged right into a dramatic new peace proposal floated by Crown Prince Abdullah of Saudi Arabia. Calling for "full normalization" in exchange for "full withdrawal" from the occupied territories, Abdullah promised to rally all the Arab nations to a final settlement with Israel. In his brand-new third-floor office at the Old Executive Office Building, a tiny room with a very high ceiling, Leverett began hammering out the details with Abdullah's foreign-policy advisor, Adel Al-Jubeir. When Ariel Sharon said that a return to the '67 borders was unacceptable, Al-Jubeir said the Saudis didn't want to be in the "real estate business" -- if the Palestinians agreed to border modifications, the Saudis could hardly refuse them. Al-Jubeir believed he had something that might actually work.
But the White House wasn't interested. Sharon already rejected it, Rice told Leverett.
At the Arab League meeting, Abdullah got every Arab state to sign his proposal in a unanimous vote.
The White House still wasn't interested.
Then violence in the Palestinian territories began to increase, climaxing in an Israeli siege of Arafat's compound. In April, Leverett accompanied Colin Powell on a tour that took them from Morocco to Egypt and Jordan and Lebanon and finally Israel. Twice they crossed the Israeli-army lines to visit Arafat under siege. Powell seemed to think he had authorization from the White House to explore what everyone was calling "political horizons," the safely vague shorthand for a peaceful future, so on the final day Leverett holed up in a suite at the David Citadel Hotel in Jerusalem with a group of senior American officials -- the U. . ambassador to Israel, the U. S. consul general to Jerusalem, assistant secretary of state for Near Eastern affairs Bill Burns -- trying to hammer out Powell's last speech.
Then the phone rang. It was Stephen Hadley on the phone from the White House. "Tell Powell he is not authorized to talk about a political horizon," he said. "Those are formal instructions."
"This is a bad idea," Leverett remembers saying. "It's bad policy and it's also humiliating for Powell, who has been talking to heads of state about this very issue for the last ten days."
"It doesn't matter," Hadley said. "There's too much resistance from Rumsfeld and the VP. Those are the instructions."
So Leverett went back into the suite and asked Powell to step aside.
Powell was furious, Leverett remembers. "What is it they're afraid of?" he demanded. "Who the hell are they afraid of?"
"I don't know sir," Leverett said.
In the spring, Crown Prince Abdullah flew to Texas to meet Bush at his ranch. The way Leverett remembers the story, Abdullah sat down and told Bush he was going to ask a direct question and wanted a direct answer. Are you going to do anything about the Palestinian issue? If you tell me no, if it's too difficult, if you're not going to give it that kind of priority, just tell me. I will understand and I will never say anything critical of you or your leadership in public, but I'm going to need to make my own judgments and my own decisions about Saudi interests.
Bush tried to stall, saying he understood his concerns and would see what he could do.
Abdullah stood up. "That's it. This meeting is over."
No Arab leader had ever spoken to Bush like that before, Leverett says. But Saudi Arabia was a key ally in the war on terror, vital to the continued U.S. oil supply, so Bush and Rice and Powell excused themselves into another room for a quick huddle.
When he came back, Bush gave Abdullah his word that he would deal seriously with the Palestinian issue.
"Okay," Abdullah said. "The president of the United States has given me his word."
So the meeting continued, ending with a famous series of photographs of Bush and Abdullah riding around the ranch in Bush's pickup.
In a meeting at the White House a few days later, Leverett saw Powell shaking his head over Abdullah's threat. He called it "the near-death experience."
Bush rolled his eyes. "We sure don't want to go through anything like that again."
Then the king of Jordan came to Washington to see Bush. There had to be a road map for peace in Palestine, the king said. Despite the previous experience with Abdullah in Crawford, Bush seemed taken by surprise, Leverett remembers, but he listened and said that the idea of a road map seemed pretty reasonable.
So suddenly they were working on a road map. For moderate Arab states, the hope of a two-state solution would offer some political cover before Washington embarked on any invasion of Iraq. In a meeting with the king of Jordan, Leverett made a personal promise that it would be out by the end of 2002.
But nothing happened. In Cheney's and Rumsfeld's offices, opposition came from men like John Hannah, Doug Feith, and Scooter Libby. In Rice's office, there was Elliott Abrams. Again they said that negotiation was just a reward for bad behavior. First the Palestinians had to reject terrorism and practice democracy.
Finally, it was a bitter-cold day just after Thanksgiving and Leverett was on a family trip to the Washington Zoo, standing in front of the giraffe enclosure. The White House patched through a call from the foreign minister of Jordan, Marwan Muasher, who said that Rice had just told him the road map was off. "Do you have any idea how this has pulled the rug out from under us, from under me?" Muasher said. "I'm the one that has to go into Arab League meetings and get beat up and say, 'No, there's going to be a plan out by the end of the year.' How can we ever trust you again?"
On Monday, Leverett went straight to Rice's office for an explanation. She told him that Ariel Sharon had called early elections in Israel and asked Bush to shelve any Palestinian plan. This time Leverett couldn't hide his exasperation. "You told the whole world you were going to put this out before Christmas," he said. "Because one Israeli politician told you it's going to make things politically difficult for him, you don't put it out? Do you realize how hard that makes things for all our Arab partners?"
Rice sat impassively behind her broad desk. "If we put the road map out," she said, "it will interfere with Israeli elections."
"You are interfering with Israeli elections, just in another way."
"Flynt, the decision has already been made," Rice said.
There was also an awkward scene with the secretary of defense. They were in the Situation Room and Leverett was sitting behind Rice taking notes when suddenly Rumsfeld addressed him directly. "Why are you laughing? Did I say something funny?"
The room went silent, and Rumsfeld asked it again.
"Why are you laughing? Did I say something funny?"
"I'm sorry Mr. Secretary, I don't think I know what you're talking about."
"It looks to me like you were laughing," Rumsfeld said.
"No sir. I'm sorry if I gave that impression. I was just listening to the meeting and taking notes. Didn't mean to disturb you."
The meeting continued, message received.
By that time, Leverett and Mann had met and fallen in love. They got married in February 2003, went to Florida on their honeymoon, and got back just in time for the Shock and Awe bombing campaign. Leverett quit his NSC job in disgust. Mann rotated back to the State Department.
Then came the moment that would lead to an extraordinary battle with the Bush administration. It was an average morning in April, about four weeks into the war. Mann picked up her daily folder and sat down at her desk, glancing at a fax cover page. The fax was from the Swiss ambassador to Iran, which wasn't unusual -- since the U.S. had no formal relationship with Iran, the Swiss ambassador represented American interests there and often faxed over updates on what he was doing. This time he'd met with Sa-deq Kharrazi, a well-connected Iranian who was the nephew of the foreign minister and son-in-law to the supreme leader. Amazingly, Kharrazi had presented the ambassador with a detailed proposal for peace in the Middle East, approved at the highest levels in Tehran.
A two-page summary was attached. Scanning it, Mann was startled by one dramatic concession after another -- "decisive action" against all terrorists in Iran, an end of support for Hamas and the Islamic Jihad, a promise to cease its nuclear program, and also an agreement to recognize Israel.
This was huge. Mann sat down and drafted a quick memo to her boss, Richard Haass. It was important to send a swift and positive response.
Then she heard that the White House had already made up its mind -- it was going to ignore the offer. Its only response was to lodge a formal complaint with the Swiss government about their ambassador's meddling.
A few days after that, a terrorist attack in Saudi Arabia killed thirty-four people, including eight Americans, and an intelligence report said the bombers had been in phone contact with Al Qaeda members in Iran. Although it was unknown whether Tehran had anything to do with the bombing or if the terrorists were hiding out in the lawless areas near the border, Rumsfeld set the tone for the administration's response at his next press conference. "There's no question but that there have been and are today senior Al Qaeda leaders in Iran, and they are busy."
Colin Powell saw Mann's memo. A couple weeks later he approached her at a State Department reception and said, "It was a very good memo. I couldn't sell it at the White House."
In response to questions from Esquire, Colin Powell called Leverett "very able" and confirms much of what he says. Leverett's account of the clash between Bush and Crown Prince Abdullah was accurate, he said. "It was a very serious moment and no one wanted to see if the Saudis were bluffing." The same goes for the story about his speech in Israel in 2002. "I had major problems with the White House on what I wanted to say."
On the subject of the peace offer, though, Powell was defensive. "I talked to all of my key assistants since Flynt started talking about an Iranian grand bargain, but none of us recall seeing this initiative as a grand bargain."
On the general subject of negotiations with Iran, he responded with pointed politesse. "We talked to the Iranians quietly up until 2003. The president chose not to continue that channel."
That is putting it mildly. In May of 2003, when the U.S. was still in the triumphant "mission accomplished" phase of the Iraq war, word started filtering out of the White House about an aggressive new Iran policy that would include efforts to destabilize the Iranian government and even to promote a popular uprising. In his first public statement on Iran policy since leaving the NSC, Leverett told The Washington Post he thought the White House was making a dangerous mistake. "What it means is we will end up with an Iran that has nuclear weapons and no dialogue with the United States."
In the years that followed, he spoke out in dozens of newspaper editorials and a book, all making variations on the same argument -- America's approach to rogue nations was all sticks and no carrots, all economic sanctions and threats of war without any dialogue. "To bring about real change," he argued, "we must also offer concrete benefits." Of course states like Iran and Syria messed around in Iraq, he said. Iran was supporting the Iraqi opposition when the U.S. was still supporting Saddam Hussein. It was insane to expect them to stop when the goal of a Shiite Iraq was finally in reach. The only way to solve the underlying issues was to offer Iran a "grand bargain" that would recognize the legitimacy of Iran's government and its right to a role in the region.
But that was an unthinkable thought. The White House ignored him. Democrats ignored him. The Brookings Institution declined to renew his contract.
Then he started talking about the peace offer. By then it was 2006 and the war wasn't going well and suddenly people started to respond: You mean Iran isn't evil? They helped fight the Taliban? They wanted to make peace? He summed it all up in a long paper for a Washington think tank that happened to be scheduled for publication last November, a vulnerable time for the White House, just after the Democrats swept the midterm elections and the Iraq Study Group released its report calling for negotiations with Syria and Iran. When he submitted the paper to the CIA for a routine review, they told him the CIA had no problem with it but someone from the NSC called to complain. "You shouldn't have cleared this without letting the White House take a look at it," the official said.
Leverett told them he wasn't going to let White House operatives judge his criticisms of White House operatives and distilled his argument into an op-ed piece for The New York Times. This time he shared a byline with his wife, who had experienced the peace offer up close. They submitted their first draft to the CIA and the State Department on a Sunday in early December, expecting to hear back the next day.
The next morning, Leverett gave a blistering talk on Bush's Iran policy to the influential conservatives at the Cato Institute. The speech was carried live on C-SPAN. Later that day, he flew to New York and made the same arguments at a private dinner with the UN ambassadors of Russia and Britain. He was starting to have an impact.
By Tuesday, he still hadn't heard from the CIA review board.
They called on Wednesday and told him that there was nothing classified in the piece as far as the agency was concerned, but someone in the West Wing wasn't happy with it and would be redacting large sections.
"You're the clearing agency," Leverett said. "You're the people named in my agreement."
They said their hands were tied.
After consulting a lawyer, Leverett and Mann and a researcher worked through the night to assemble a list of public sources where the blacked-out material had already been published. They also took out one line that might have been based on a classified document.
But the White House wouldn't budge. It was a First Amendment showdown.
On Thursday, Leverett and Mann decided to publish the piece with large sections of type blacked out, 168 words in all. Since the piece had been rendered pretty much incomprehensible, they included a list of public sources. "To make sense of our op-ed article, readers will have to look up the citations themselves."
As they tell their story, Mann rushes off to pick up one of their sons from a play date and Leverett takes over, telling what happened over the following months:
Bush sent a second carrier group to the Persian Gulf.
U.S. troops started to arrest Iranians living in Baghdad, accusing them of working with insurgents.
Bush accused Iran of "providing material support" for attacks on U.S. forces, a formulation that suggested a legal justification for a preemptive attack.
Senator Jim Webb of Virginia pushed through an amendment requiring Bush to get congressional authorization for an attack.
Colin Powell broke his long silence with a pointed warning. "You can't negotiate when you tell the other side, 'Give us what a negotiation would produce before the negotiations start.' "
Even Henry Kissinger started giving interviews on the need to "exhaust every possibility to come to an understanding with Iran."
From inside the White House, Leverett was hearing a scary scenario: The Russians were scheduled to ship fuel rods to the Iranian nuclear reactor in Bushehr, which meant the reactor would become operational by this November, at which point it would be impossible to bomb -- the fallout alone would turn the city into an urban Chernobyl. The White House was seriously considering a preemptive attack when the Russians cooled things down by saying Iran hadn't paid its bills, so they would hold back the Bushehr fuel rods for a while.
That put things into a summer lull. But by August, tensions were rising again. U.S. troops in Baghdad arrested an official delegation of Iranian energy experts, leading them out of a hotel in blindfolds and handcuffs. Then Iran said that it had paid its bills and that the Russians were ready to deliver the Bushehr shipment. In Time magazine, former CIA officer and author Robert Baer quoted a highly placed White House official:
"IEDs are a casus belli for this administration. There will be an attack on Iran."
Mann steps back out on the deck and starts collecting the scattered toys to prepare the house for a dinner party, the typical modern American mother multitasking her way through a busy day. "The reason I have to be so careful now is that I'm legally on notice and they will prosecute things that I say or do," she says, picking up a plastic truck.
"Because of that one article?"
"Yeah."
Outside, it's getting warmer. There's a heavy haze and floating bugs and for a moment it feels a bit ominous, a gathering silence, one of those moments when giant pods start to sprout in local basements.
"We're tired," Mann says. "Nobody listens."
It seems inconceivable to her that once again a war could be coming, and once again no one is listening. Another pair of lawn mowers joins the chorus and the spell breaks. A cab pulls in the driveway. The caterer comes to prepare for the dinner guests.
Find this article at: http://www.esquire.com/features/iranbriefing1107
In the years after 9/11, Flynt Leverett and Hillary Mann worked at the highest levels of the Bush administration as Middle East policy experts for the National Security Council. Mann conducted secret negotiations with Iran. Leverett traveled with Colin Powell and advised Condoleezza Rice. They each played crucial roles in formulating policy for the region leading up to the war in Iraq. But when they left the White House, they left with a growing sense of alarm -- not only was the Bush administration headed straight for war with Iran, it had been set on this course for years. That was what people didn't realize. It was just like Iraq, when the White House was so eager for war it couldn't wait for the UN inspectors to leave. The steps have been many and steady and all in the same direction. And now things are getting much worse. We are getting closer and closer to the tripline, they say.
"The hard-liners are upping the pressure on the State Department," says Leverett. "They're basically saying, 'You've been trying to engage Iran for more than a year now and what do you have to show for it? They keep building more centrifuges, they're sending this IED stuff over into Iraq that's killing American soldiers, the human-rights internal political situation has gotten more repressive -- what the hell do you have to show for this engagement strategy?' "
But the engagement strategy was never serious and was designed to fail, they say. Over the last year, Rice has begun saying she would talk to "anybody, anywhere, anytime," but not to the Iranians unless they stopped enriching uranium first. That's not a serious approach to diplomacy, Mann says. Diplomacy is about talking to your enemies. That's how wars are averted. You work up to the big things. And when U.S. ambassador to Iraq Ryan Crocker had his much-publicized meeting with his Iranian counterpart in Baghdad this spring, he didn't even have permission from the White House to schedule a second meeting.
The most ominous new development is the Bush administration's push to name the Iranian Revolutionary Guards a terrorist organization.
"The U.S. has designated any number of states over the years as state sponsors of terrorism," says Leverett. "But here for the first time the U.S. is saying that part of a government is itself a terrorist organization."
This is what Leverett and Mann fear will happen: The diplomatic effort in the United Nations will fail when it becomes clear that Russia's and China's geopolitical ambitions will not accommodate the inconvenience of energy sanctions against Iran. Without any meaningful incentive from the U.S. to be friendly, Iran will keep meddling in Iraq and installing nuclear centrifuges. This will trigger a response from the hard-liners in the White House, who feel that it is their moral duty to deal with Iran before the Democrats take over American foreign policy. "If you get all those elements coming together, say in the first half of '08," says Leverett, "what is this president going to do? I think there is a serious risk he would decide to order an attack on the Iranian nuclear installations and probably a wider target zone."
This would result in a dramatic increase in attacks on U.S. forces in Iraq, attacks by proxy forces like Hezbollah, and an unknown reaction from the wobbly states of Afghanistan and Pakistan, where millions admire Iran's resistance to the Great Satan. "As disastrous as Iraq has been," says Mann, "an attack on Iran could engulf America in a war with the entire Muslim world."
Mann and Leverett believe that none of this had to be.
Flynt Lawrence Leverett grew up in Fort Worth and went to Texas Christian University. He spent the first nine years of his government career as a CIA analyst specializing in the Middle East. He voted for George Bush in 2000. On the day the assassins of Al Qaeda flew two hijacked airplanes into the World Trade Center, Colin Powell summoned him to help plan the response. Five months later, Leverett landed a plum post on the National Security Council. When Condoleezza Rice discussed the Middle East with President Bush and Donald Rumsfeld, Leverett was the man standing behind her taking notes and whispering in her ear.
Today, he sits on the back deck of a house tucked into the curve of a leafy suburban street in McLean, Virginia, a forty-nine-year-old white American man wearing khakis and a white dress shirt and wire-rimmed glasses. Mann sits next to him, also wearing khakis. She's thirty-nine but looks much younger, with straight brown hair and a tomboy's open face. The polish on her toenails is pink. If you saw her around McLean, you wouldn't hesitate:
Soccer mom. Classic soccer mom.
But with degrees from Brandeis and Harvard Law and stints at Tel Aviv University and the powerful Israeli lobby known as AIPAC, she has even better right-wing credentials than her husband.
As they talk, eating grapes out of a bowl, lawn mowers hum and birds chirp. The floor is littered with toy trucks and rubber animals left behind by the youngest of their four children. But the tranquillity is misleading. When Mann and Leverett went public with the inside story behind the impending disaster with Iran, the White House dismissed them. Then it imposed prior restraint on them, an extraordinary episode of government censorship. Finally, it threatened them.
Now they are afraid of the White House, and watching what they say. But still, they feel they have to speak out.
Like so many things these days, this story began on the morning of September 11, 2001. On Forty-fifth Street in Manhattan, Mann had just been evacuated from the offices of the U.S. mission to the United Nations and was walking home to her apartment on Thirty-eighth Street -- walking south, toward the giant plume of smoke. When her cell phone rang, she picked it up immediately because her sister worked at the World Trade Center and she was frantic for word. But it wasn't her sister, it was a senior Iranian diplomat. To protect him from reprisals from the Iranian government, she doesn't want to name him, but she describes him as a cultured man in his fifties with salt-and-pepper hair. Since early spring, they had been meeting secretly in a small conference room at the UN.
"Are you all right?" he asked.
Yes, she said, she was fine.
The attack was a terrible tragedy, he said, doubtless the work of Al Qaeda.
"I hope that we can still work together," he said.
That same day, in Washington, on the seventh floor of the State Department building, a security guard opened the door of Leverett's office and told him they were evacuating the building. Leverett was Powell's specialist on terrorist states like Syria and Libya, so he knew the world was about to go through a dramatic change. As he joined the people milling on the sidewalk, his mind was already racing.
Then he got a call summoning him back to Foggy Bottom. At the entrance to a specially fortified office, he showed his badge to the guards and passed into a windowless conference room. There were about a dozen people there, Powell's top foreign-policy planners. Powell told them that their first job was to make plans to capture or kill Osama bin Laden. The second job was to rally allies. That meant detailed strategies for approaching other nations -- in some cases, Powell could make the approach, in others the president would have to make the call. Then Powell left them to work through the night.
At 5:30 a.m. on September 12, they walked the list to the office of the deputy secretary of state, Richard Armitage. Powell took it straight to the White House.
Mann and Leverett didn't know each other then, but they were already traveling down parallel tracks. Months before September 11, Mann had been negotiating with the Iranian diplomat at the UN. After the attacks, the meetings continued, sometimes alone and sometimes with their Russian counterpart sitting in. Soon they traded the conference room for the Delegates' Lounge, an airy two-story bar with ashtrays for all the foreigners who were used to smoking indoors. One day, up on the second floor where the windows overlooked the East River, the diplomat told her that Iran was ready to cooperate unconditionally, a phrase that had seismic diplomatic implications. Unconditional talks are what the U.S. had been demanding as a precondition to any official diplomatic contact between the U.S. and Iran. And it would be the first chance since the Islamic revolution for any kind of rapprochement. "It was revolutionary," Mann says. "It could have changed the world."
A few weeks later, after signing on to Condoleezza Rice's staff as the new Iran expert in the National Security Council, Mann flew to Europe with Ryan Crocker -- then a deputy assistant secretary of state -- to hold talks with a team of Iranian diplomats. Meeting in a light-filled conference room at the old UN building in Geneva, they hammered out plans for Iranian help in the war against the Taliban. The Iranians agreed to provide assistance if any American was shot down near their territory, agreed to let the U.S. send food in through their border, and even agreed to restrain some "really bad Afghanis," like a rabidly anti-American warlord named Gulbuddin Hekmatyar, quietly putting him under house arrest in Tehran. These were significant concessions. At the same time, special envoy James Dobbins was having very public and warm discussions in Bonn with the Iranian deputy foreign minister as they worked together to set up a new government for Afghanistan. And the Iranians seemed eager to help in more tactical ways as well. They had intimate knowledge of Taliban strategic capabilities and they wanted to share it with the Americans.
One day during the U.S. bombing campaign, Mann and her Iranian counterparts were sitting around the wooden conference table speculating about the future Afghani constitution. Suddenly the Iranian who knew so much about intelligence matters started pounding on the table. "Enough of that!" he shouted, unfurling a map of Afghanistan. Here was a place the Americans needed to bomb. And here, and here, he angrily jabbed his finger at the map.
Leverett spent those days in his office at the State Department building, watching the revolution in the Middle East and coming up with plans on how to capture the lightning. Suddenly countries like Syria and Libya and Sudan and Iran were coming forward with offers of help, which raised a vital question -- should they stay on the same enemies list as North Korea and Iraq, or could there be a new slot for "friendly" sponsors of terror?
As a CIA analyst, Leverett had come to the view that Middle Eastern terrorism was more tactical than religious. Syria wanted the Golan Heights back and didn't have the military strength to put up a serious fight against Israel, so it relied on "asymmetrical methods." Accepting this idea meant that nations like Syria weren't locked in a fanatic mind-set, that they could evolve to use new methods, so Leverett told Powell to seize the moment and draw up a "road map" to peace for the problem countries of the Middle East -- expel your terrorist groups and stop trying to develop weapons of mass destruction, and we will take you off the sponsors-of-terrorism list and start a new era of cooperation.
That December, just after the triumph over Afghanistan, Powell took the idea to the White House. The occasion was the regular "deputies meeting" at the Situation Room. Gathered around the table were the deputy secretary of state, the deputy secretary of defense, the deputy director of the CIA, a representative from Vice-President Cheney's office, and also the deputy national security advisor, Stephen Hadley.
Hadley hated the idea. So did the representatives from Rumsfeld and Cheney. They thought that it was a reward for bad behavior, that the sponsors of terrorism should stop just because it's the right thing to do.
After the meeting, Hadley wrote up a brief memo that came to be known as Hadley's Rules:
If a state like Syria or Iran offers specific assistance, we will take it without offering anything in return. We will accept it without strings or promises. We won't try to build on it.
Leverett thought that was simply nutty. To strike postures of moral purity, they were throwing away a chance for real progress. But just a few days later, Condoleezza Rice called him into her office, warming him up with talk of how classical music shaped their childhoods. As he told her about the year he spent studying classical piano at the Liszt Academy in Budapest, Leverett felt a real connection. Then she said she was looking for someone to take the job of senior director of Mideast affairs at the National Security Council, someone who would take a real leadership role on the Palestinian issue. Big changes were coming in 2002.
He repeated his firm belief that the White House had to draw up a road map with real solutions to the division of Jerusalem and the problem of refugees, something with final borders. That was the only remedy to the crisis in the Middle East.
Just after the New Year, Rice called and offered him the job.
The bowl of grapes is empty and the plate of cheese moves to the center of the table. Leverett's teenage son comes in with questions about a teacher. Periodically, Mann interrupts herself. "This is off the record," she says. "This is going to have to be on background."
She's not allowed to talk about confidential documents or intelligence matters, but the topic of her negotiations with the Iranians is especially touchy.
"As far as they're concerned, the whole idea that there were talks is something I shouldn't even be talking about," she says.
All ranks and ranking are out. "They don't want there to be anything about the level of the talks or who was involved."
"They won't even let us say something like 'senior' or 'important,' 'high-ranking,' or 'high-level,' " Leverett says.
But the important thing is that the Iranians agreed to talk unconditionally, Mann says. "They specifically told me time and again that they were doing this because they understood the impact of this attack on the U.S., and they thought that if they helped us unconditionally, that would be the way to change the dynamic for the first time in twenty-five years."
She believed them.
But while Leverett was still moving into the Old Executive Office Building next to the White House, Mann was wrapped up in the crisis over a ship called the Karin A that left Iran loaded with fifty tons of weapons. According to the Israeli navy, which intercepted the Karin A in the Red Sea, it was headed for the PLO. In staff meetings at the White House, Mann argued for caution. The Iranian government probably didn't even know about the arms shipments. It was issuing official denials in the most passionate way, even sending its deputy foreign minister onto Fox News to say "categorically" that "all segments of the Iranian government" had nothing to do with the arms shipment, which meant the "total government, not simply President Khatami's administration."
Bush waited. Three weeks later, it was time for his 2002 State of the Union address. Mann spent the morning in a meeting with Condoleezza Rice and the new president of Afghanistan, Hamid Karzai, who kept asking Rice for an expanded international peacekeeping force. Rice kept saying that the Afghans would have to solve their own problems. Then they went off to join the president's motorcade and Mann headed back to her office to watch the speech on TV.
That was the speech in which Bush linked Iran to Iraq and North Korea with a memorable phrase:
"States like these, and their terrorist allies, constitute an axis of evil, arming to threaten the peace of the world."
The Iranians had been engaging in high-level diplomacy with the American government for more than a year, so the phrase was shocking and profound.
After that, the Iranian diplomats skipped the monthly meeting in Geneva. But they came again in March. And so did Mann. "They said they had put their necks out to talk to us and they were taking big risks with their careers and their families and their lives," Mann says.
The secret negotiations with Iran continued, every month for another year.
Leverett plunged right into a dramatic new peace proposal floated by Crown Prince Abdullah of Saudi Arabia. Calling for "full normalization" in exchange for "full withdrawal" from the occupied territories, Abdullah promised to rally all the Arab nations to a final settlement with Israel. In his brand-new third-floor office at the Old Executive Office Building, a tiny room with a very high ceiling, Leverett began hammering out the details with Abdullah's foreign-policy advisor, Adel Al-Jubeir. When Ariel Sharon said that a return to the '67 borders was unacceptable, Al-Jubeir said the Saudis didn't want to be in the "real estate business" -- if the Palestinians agreed to border modifications, the Saudis could hardly refuse them. Al-Jubeir believed he had something that might actually work.
But the White House wasn't interested. Sharon already rejected it, Rice told Leverett.
At the Arab League meeting, Abdullah got every Arab state to sign his proposal in a unanimous vote.
The White House still wasn't interested.
Then violence in the Palestinian territories began to increase, climaxing in an Israeli siege of Arafat's compound. In April, Leverett accompanied Colin Powell on a tour that took them from Morocco to Egypt and Jordan and Lebanon and finally Israel. Twice they crossed the Israeli-army lines to visit Arafat under siege. Powell seemed to think he had authorization from the White House to explore what everyone was calling "political horizons," the safely vague shorthand for a peaceful future, so on the final day Leverett holed up in a suite at the David Citadel Hotel in Jerusalem with a group of senior American officials -- the U. . ambassador to Israel, the U. S. consul general to Jerusalem, assistant secretary of state for Near Eastern affairs Bill Burns -- trying to hammer out Powell's last speech.
Then the phone rang. It was Stephen Hadley on the phone from the White House. "Tell Powell he is not authorized to talk about a political horizon," he said. "Those are formal instructions."
"This is a bad idea," Leverett remembers saying. "It's bad policy and it's also humiliating for Powell, who has been talking to heads of state about this very issue for the last ten days."
"It doesn't matter," Hadley said. "There's too much resistance from Rumsfeld and the VP. Those are the instructions."
So Leverett went back into the suite and asked Powell to step aside.
Powell was furious, Leverett remembers. "What is it they're afraid of?" he demanded. "Who the hell are they afraid of?"
"I don't know sir," Leverett said.
In the spring, Crown Prince Abdullah flew to Texas to meet Bush at his ranch. The way Leverett remembers the story, Abdullah sat down and told Bush he was going to ask a direct question and wanted a direct answer. Are you going to do anything about the Palestinian issue? If you tell me no, if it's too difficult, if you're not going to give it that kind of priority, just tell me. I will understand and I will never say anything critical of you or your leadership in public, but I'm going to need to make my own judgments and my own decisions about Saudi interests.
Bush tried to stall, saying he understood his concerns and would see what he could do.
Abdullah stood up. "That's it. This meeting is over."
No Arab leader had ever spoken to Bush like that before, Leverett says. But Saudi Arabia was a key ally in the war on terror, vital to the continued U.S. oil supply, so Bush and Rice and Powell excused themselves into another room for a quick huddle.
When he came back, Bush gave Abdullah his word that he would deal seriously with the Palestinian issue.
"Okay," Abdullah said. "The president of the United States has given me his word."
So the meeting continued, ending with a famous series of photographs of Bush and Abdullah riding around the ranch in Bush's pickup.
In a meeting at the White House a few days later, Leverett saw Powell shaking his head over Abdullah's threat. He called it "the near-death experience."
Bush rolled his eyes. "We sure don't want to go through anything like that again."
Then the king of Jordan came to Washington to see Bush. There had to be a road map for peace in Palestine, the king said. Despite the previous experience with Abdullah in Crawford, Bush seemed taken by surprise, Leverett remembers, but he listened and said that the idea of a road map seemed pretty reasonable.
So suddenly they were working on a road map. For moderate Arab states, the hope of a two-state solution would offer some political cover before Washington embarked on any invasion of Iraq. In a meeting with the king of Jordan, Leverett made a personal promise that it would be out by the end of 2002.
But nothing happened. In Cheney's and Rumsfeld's offices, opposition came from men like John Hannah, Doug Feith, and Scooter Libby. In Rice's office, there was Elliott Abrams. Again they said that negotiation was just a reward for bad behavior. First the Palestinians had to reject terrorism and practice democracy.
Finally, it was a bitter-cold day just after Thanksgiving and Leverett was on a family trip to the Washington Zoo, standing in front of the giraffe enclosure. The White House patched through a call from the foreign minister of Jordan, Marwan Muasher, who said that Rice had just told him the road map was off. "Do you have any idea how this has pulled the rug out from under us, from under me?" Muasher said. "I'm the one that has to go into Arab League meetings and get beat up and say, 'No, there's going to be a plan out by the end of the year.' How can we ever trust you again?"
On Monday, Leverett went straight to Rice's office for an explanation. She told him that Ariel Sharon had called early elections in Israel and asked Bush to shelve any Palestinian plan. This time Leverett couldn't hide his exasperation. "You told the whole world you were going to put this out before Christmas," he said. "Because one Israeli politician told you it's going to make things politically difficult for him, you don't put it out? Do you realize how hard that makes things for all our Arab partners?"
Rice sat impassively behind her broad desk. "If we put the road map out," she said, "it will interfere with Israeli elections."
"You are interfering with Israeli elections, just in another way."
"Flynt, the decision has already been made," Rice said.
There was also an awkward scene with the secretary of defense. They were in the Situation Room and Leverett was sitting behind Rice taking notes when suddenly Rumsfeld addressed him directly. "Why are you laughing? Did I say something funny?"
The room went silent, and Rumsfeld asked it again.
"Why are you laughing? Did I say something funny?"
"I'm sorry Mr. Secretary, I don't think I know what you're talking about."
"It looks to me like you were laughing," Rumsfeld said.
"No sir. I'm sorry if I gave that impression. I was just listening to the meeting and taking notes. Didn't mean to disturb you."
The meeting continued, message received.
By that time, Leverett and Mann had met and fallen in love. They got married in February 2003, went to Florida on their honeymoon, and got back just in time for the Shock and Awe bombing campaign. Leverett quit his NSC job in disgust. Mann rotated back to the State Department.
Then came the moment that would lead to an extraordinary battle with the Bush administration. It was an average morning in April, about four weeks into the war. Mann picked up her daily folder and sat down at her desk, glancing at a fax cover page. The fax was from the Swiss ambassador to Iran, which wasn't unusual -- since the U.S. had no formal relationship with Iran, the Swiss ambassador represented American interests there and often faxed over updates on what he was doing. This time he'd met with Sa-deq Kharrazi, a well-connected Iranian who was the nephew of the foreign minister and son-in-law to the supreme leader. Amazingly, Kharrazi had presented the ambassador with a detailed proposal for peace in the Middle East, approved at the highest levels in Tehran.
A two-page summary was attached. Scanning it, Mann was startled by one dramatic concession after another -- "decisive action" against all terrorists in Iran, an end of support for Hamas and the Islamic Jihad, a promise to cease its nuclear program, and also an agreement to recognize Israel.
This was huge. Mann sat down and drafted a quick memo to her boss, Richard Haass. It was important to send a swift and positive response.
Then she heard that the White House had already made up its mind -- it was going to ignore the offer. Its only response was to lodge a formal complaint with the Swiss government about their ambassador's meddling.
A few days after that, a terrorist attack in Saudi Arabia killed thirty-four people, including eight Americans, and an intelligence report said the bombers had been in phone contact with Al Qaeda members in Iran. Although it was unknown whether Tehran had anything to do with the bombing or if the terrorists were hiding out in the lawless areas near the border, Rumsfeld set the tone for the administration's response at his next press conference. "There's no question but that there have been and are today senior Al Qaeda leaders in Iran, and they are busy."
Colin Powell saw Mann's memo. A couple weeks later he approached her at a State Department reception and said, "It was a very good memo. I couldn't sell it at the White House."
In response to questions from Esquire, Colin Powell called Leverett "very able" and confirms much of what he says. Leverett's account of the clash between Bush and Crown Prince Abdullah was accurate, he said. "It was a very serious moment and no one wanted to see if the Saudis were bluffing." The same goes for the story about his speech in Israel in 2002. "I had major problems with the White House on what I wanted to say."
On the subject of the peace offer, though, Powell was defensive. "I talked to all of my key assistants since Flynt started talking about an Iranian grand bargain, but none of us recall seeing this initiative as a grand bargain."
On the general subject of negotiations with Iran, he responded with pointed politesse. "We talked to the Iranians quietly up until 2003. The president chose not to continue that channel."
That is putting it mildly. In May of 2003, when the U.S. was still in the triumphant "mission accomplished" phase of the Iraq war, word started filtering out of the White House about an aggressive new Iran policy that would include efforts to destabilize the Iranian government and even to promote a popular uprising. In his first public statement on Iran policy since leaving the NSC, Leverett told The Washington Post he thought the White House was making a dangerous mistake. "What it means is we will end up with an Iran that has nuclear weapons and no dialogue with the United States."
In the years that followed, he spoke out in dozens of newspaper editorials and a book, all making variations on the same argument -- America's approach to rogue nations was all sticks and no carrots, all economic sanctions and threats of war without any dialogue. "To bring about real change," he argued, "we must also offer concrete benefits." Of course states like Iran and Syria messed around in Iraq, he said. Iran was supporting the Iraqi opposition when the U.S. was still supporting Saddam Hussein. It was insane to expect them to stop when the goal of a Shiite Iraq was finally in reach. The only way to solve the underlying issues was to offer Iran a "grand bargain" that would recognize the legitimacy of Iran's government and its right to a role in the region.
But that was an unthinkable thought. The White House ignored him. Democrats ignored him. The Brookings Institution declined to renew his contract.
Then he started talking about the peace offer. By then it was 2006 and the war wasn't going well and suddenly people started to respond: You mean Iran isn't evil? They helped fight the Taliban? They wanted to make peace? He summed it all up in a long paper for a Washington think tank that happened to be scheduled for publication last November, a vulnerable time for the White House, just after the Democrats swept the midterm elections and the Iraq Study Group released its report calling for negotiations with Syria and Iran. When he submitted the paper to the CIA for a routine review, they told him the CIA had no problem with it but someone from the NSC called to complain. "You shouldn't have cleared this without letting the White House take a look at it," the official said.
Leverett told them he wasn't going to let White House operatives judge his criticisms of White House operatives and distilled his argument into an op-ed piece for The New York Times. This time he shared a byline with his wife, who had experienced the peace offer up close. They submitted their first draft to the CIA and the State Department on a Sunday in early December, expecting to hear back the next day.
The next morning, Leverett gave a blistering talk on Bush's Iran policy to the influential conservatives at the Cato Institute. The speech was carried live on C-SPAN. Later that day, he flew to New York and made the same arguments at a private dinner with the UN ambassadors of Russia and Britain. He was starting to have an impact.
By Tuesday, he still hadn't heard from the CIA review board.
They called on Wednesday and told him that there was nothing classified in the piece as far as the agency was concerned, but someone in the West Wing wasn't happy with it and would be redacting large sections.
"You're the clearing agency," Leverett said. "You're the people named in my agreement."
They said their hands were tied.
After consulting a lawyer, Leverett and Mann and a researcher worked through the night to assemble a list of public sources where the blacked-out material had already been published. They also took out one line that might have been based on a classified document.
But the White House wouldn't budge. It was a First Amendment showdown.
On Thursday, Leverett and Mann decided to publish the piece with large sections of type blacked out, 168 words in all. Since the piece had been rendered pretty much incomprehensible, they included a list of public sources. "To make sense of our op-ed article, readers will have to look up the citations themselves."
As they tell their story, Mann rushes off to pick up one of their sons from a play date and Leverett takes over, telling what happened over the following months:
Bush sent a second carrier group to the Persian Gulf.
U.S. troops started to arrest Iranians living in Baghdad, accusing them of working with insurgents.
Bush accused Iran of "providing material support" for attacks on U.S. forces, a formulation that suggested a legal justification for a preemptive attack.
Senator Jim Webb of Virginia pushed through an amendment requiring Bush to get congressional authorization for an attack.
Colin Powell broke his long silence with a pointed warning. "You can't negotiate when you tell the other side, 'Give us what a negotiation would produce before the negotiations start.' "
Even Henry Kissinger started giving interviews on the need to "exhaust every possibility to come to an understanding with Iran."
From inside the White House, Leverett was hearing a scary scenario: The Russians were scheduled to ship fuel rods to the Iranian nuclear reactor in Bushehr, which meant the reactor would become operational by this November, at which point it would be impossible to bomb -- the fallout alone would turn the city into an urban Chernobyl. The White House was seriously considering a preemptive attack when the Russians cooled things down by saying Iran hadn't paid its bills, so they would hold back the Bushehr fuel rods for a while.
That put things into a summer lull. But by August, tensions were rising again. U.S. troops in Baghdad arrested an official delegation of Iranian energy experts, leading them out of a hotel in blindfolds and handcuffs. Then Iran said that it had paid its bills and that the Russians were ready to deliver the Bushehr shipment. In Time magazine, former CIA officer and author Robert Baer quoted a highly placed White House official:
"IEDs are a casus belli for this administration. There will be an attack on Iran."
Mann steps back out on the deck and starts collecting the scattered toys to prepare the house for a dinner party, the typical modern American mother multitasking her way through a busy day. "The reason I have to be so careful now is that I'm legally on notice and they will prosecute things that I say or do," she says, picking up a plastic truck.
"Because of that one article?"
"Yeah."
Outside, it's getting warmer. There's a heavy haze and floating bugs and for a moment it feels a bit ominous, a gathering silence, one of those moments when giant pods start to sprout in local basements.
"We're tired," Mann says. "Nobody listens."
It seems inconceivable to her that once again a war could be coming, and once again no one is listening. Another pair of lawn mowers joins the chorus and the spell breaks. A cab pulls in the driveway. The caterer comes to prepare for the dinner guests.
Find this article at: http://www.esquire.com/features/iranbriefing1107
Wednesday, October 03, 2007
Sunday, August 26, 2007
Tuesday, August 14, 2007
Monday, August 13, 2007
The basics of programming embedded processors: Part 3
By Wayne Wolf , Embedded.com
Aug 7 2007 (0:15 AM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=201203685
It is useful to understand how a high-level language program is translated into instructions. First, since implementing an embedded computing system often requires controlling the instruction sequences used to handle interrupts, placement of data and instructions in memory, and so forth, understanding how the compiler works can help you know when you cannot rely on the compiler.
Next, because many applications are also performance sensitive, understanding how code is generated can help you meet your performance goals, either by writing high-level code that gets compiled into the instructions you want or by recognizing when you must write your own assembly code.
Compilation = translation + optimization. Compilation combines translation and optimization. The high-level language program is translated into the lower-level form of instructions; optimizations try to generate better instruction sequences than would be possible if the brute force technique of independently translating source code statements were used.
Optimization techniques focus on more of the program to ensure that compilation decisions that appear to be good for one statement are not unnecessarily problematic for other parts of the program.
The compilation process is summarized in Figure 5-12 below. Compilation begins with high-level language code such as C and generally produces assembly code. (Directly producing object code simply duplicates the functions of an assembler, which is a very desirable stand-alone program to have.)
The highlevel language program is parsed to break it into statements and expressions. In addition, a symbol table is generated, which includes all the named objects in the program. Some compilers may then perform higher-level optimizations that can be viewed as modifying the high-level language program input without reference to instructions.
Figure 5-12. The compilation process.
Simplifying arithmetic expressions is one example of a machineindependent optimization. Not all compilers do such optimizations, and compilers can vary widely regarding which combinations of machine-independent optimizations they do perform. Instruction-level optimizations are aimed at generating code.
They may work directly on real instructions or on a pseudoinstruction format that is later mapped onto the instructions of the target CPU. This level of optimization also helps modularize the compiler by allowing code generation to create simpler code that is later optimized. For example, consider the following array access code:
x[i] = c*x[i];
A simple code generator would generate the address for x[i] twice, once for each appearance in the statement. The later optimization phases can recognize this as an example of common expressions that need not be duplicated.
While in this simple case it would be possible to create a code generator that never generated the redundant expression, taking into account every such optimization at code generation time is very difficult. We get better code and more reliable compilers by generating simple code first and then optimizing it.
Statement Translation
In this section, we consider the basic job of translating the high-level language program with little or no optimization. Let's first consider how to translate an expression.
A large amount of the code in a typical application consists of arithmetic and logical expressions. Understanding how to compile a single expression, as described in Example 5-2 below, is a good first step in understanding the entire compilation process.
Example 5-2 Compiling an Arithmetic Expression
In the following arithmetic expression,
a*b + 5*(c ' d)
the variable is written in terms of program variables. In some machines we may be able to perform memory-to-memory arithmetic directly on the locations corresponding to those variables. However, in many machines, such as the ARM, we must first load the variables into registers. This requires choosing which registers receive not only the named variables but also intermediate results such as (c ' d).
The code for the expression can be built by walking the data flow graph. The data flow graph for the expression appears below.
The temporary variables for the intermediate values and final result have been named w, x, y, and z. To generate code, we walk from the tree's root (where z, the final result, is generated) by traversing the nodes in post order. During the walk, we generate instructions to cover the operation at every node. The path is presented below.
The nodes are numbered in the order in which code is generated. Since every node in the data flow graph corresponds to an operation that is directly supported by the instruction set, we simply generate an instruction at every node. Since we are making an arbitrary register assignment, we can use up the registers in order starting with r1. The resulting ARM code follows:
; operator 1 (+) ADR r4,a ; get address for a MOV r1,[r4] ; load a ADR r4,b ; get address for b MOV r2,[r4] ; load b ADD r3,r1,r2 ; put w into r3 ; operator 2 (') ADR r4,c ; get address for c MOV r4,[r4] ; load c ADR r4,d ; get address for d MOV r5,[r4] ; load d SUB r6,r4,r5 ; put x into r6 ; operator 3 (*) MUL r7,r6,#5 ; operator 3, puts y into r7 ; operator 4 (+) ADD r8,r7,r3 ; operator 4, puts z into r8
The equivalent SHARC code appears below.
! operator 1 (+) R1 = DM(a); ! load a R2 = DM(b); ! load b R3 = R1 + R2; ! compute a + b ! operator 2 (') R4 = DM(c); ! load c R5 = DM(d); ! load d R6 = R4 ' R5; ! compute c ' d ! operator 3 (*) R7 = 5; R8 = R6 * R7; ! compute y ! operator 4 (+) R9 = R3 + R8; ! compute z
One obvious optimization is to reuse a register whose value is no longer needed. In the case of the intermediate values w, x, and y, we know that they cannot be used after the end of the expression (e.g., in another expression) since they have no name in the C program. However, the final result z may in fact be used in a C assignment and the value reused later in the program. In this case we would need to know when the register is no longer needed to determine its best use.
Figure 5-13. Flow of control in C and control flow diagrams
In the previous example, we made an arbitrary allocation of variables to registers for simplicity. When we have large programs with multiple expressions, we must allocate registers more carefully since CPUs have a limited number of registers. We will consider register allocation later.
We also need to be able to translate control structures. Since conditionals are controlled by expressions, the code generation techniques of the last example can be used for those expressions, leaving us with the task of generating code for the flow of control itself.
Figure 5-13 above shows a simple example of changing flow of control in C - an if statement, in which the condition controls whether the true or false branch of the if is taken. Figure 5-13 also shows the control flow diagram for the if statement. Example 5-3 illustrates how to implement conditionals in assembly language.
Example 5-3 Generating Code for a Conditional
Consider the following C statement:
if (a + b > 0) x = 5;else x = 7;
The CDFG for the statement appears below.
We know how to generate the code for the expressions. We can generate the control flow code by walking the CDFG. One ordered walk through the CDFG follows:
To generate code, we must assign a label to the first instruction at the end of a directed edge and create a branch for each edge that does not go to the immediately following instruction. The exact steps to be taken at the branch points depend on the target architecture.
On some machines, evaluating expressions generates condition codes that we can test in subsequent branches, and on other machines we must use test-and-branch instructions. ARM allows us to test condition codes, so we get the following ARM code for the 1-2-3 walk:
ADR r5,a ; get address for a LDR r1,[r5] ; load a ADR r5,b ; get address for b LDR r2,b ; load b ADD r3,r1,r2 BLE label3 ; true condition falls through branch ; true case LDR r3,#5 ; load constant ADR r5,x STR r3, [r5] ; store value into x B stmtend ; done with the true case ; false case label3 LDR r3,#7 ; load constant ADR r5,x ; get address of x STR r3,[r5] ; store value into x stmtend ...
The 1-2 and 3-4 edges do not require a branch and label because they are straight-line code. In contrast, the 1-3 and 2-4 edges do require a branch and a label for the target.
Since expressions are generally created as straight-line code, they typically require careful consideration of the order in which the operations are executed. We have much more freedom when generating conditional code because the branches ensure that the flow of control goes to the right block of code. If we walk the CDFG in a different order and lay out the code blocks in a different order in memory, we still get valid code as long as we properly place branches.
Drawing a control flow graph based on the while form of the loop helps us understand how to translate it into instructions.
C compilers can generate (using the -s flag) assembler source, which some compilers intersperse with the C code. Such code is a very good way to learn about both assembly language programming and compilation. Web Aid 5-Assembly has examples of assembly output from C compilers for the ARM and SHARC.
Procedures
Another major code generation problem is the creation of procedures. Generating code for procedures is relatively straightforward once we know the procedure linkage appropriate for the CPU. At the procedure definition, we generate the code to handle the procedure call and return. At each call of the procedure, we set up the procedure parameters and make the call.
The CPU's subroutine call mechanism is usually not sufficient to directly support procedures in modern programming languages. The procedure linkage mechanism provides a way for the program to pass parameters into the program and for the procedure to return a value.
It also provides help in restoring the values of registers that the procedure has modified. All procedures in a given programming language use the same linkage mechanism (although different languages may use different linkages). The mechanism can also be used to call handwritten assembly language routines from compiled code.
Procedure stacks are typically built to grow down from high addresses. A stack pointer (sp) defines the end of the current frame, while a frame pointer (fp) defines the end of the last frame. (The fp is technically necessary only if the stack frame can be grown by the procedure during execution.)
The procedure can refer to an element in the frame by addressing relative to sp. When a new procedure is called, the sp and fp are modified to push another frame onto the stack.
The ARM Procedure Call Standard (APCS) is a good illustration of a typical procedure linkage mechanism. Although the stack frames are in main memory, understanding how registers are used is key to understanding the mechanism, as explained below.
r0"r3 are used to pass parameters into the procedure. r0 is also used to hold the return value. If more than four parameters are required, they are put on the stack frame.
r4"r7 hold register variables.
r11 is the frame pointer and r13 is the stack pointer.
r10 holds the limiting address on stack size, which is used to check for stack overflows.
Other registers have additional uses in the protocol.
Figure 5-14. Layout of a one-dimensional array in memory
Data Structures
The compiler must also translate references to data structures into references to raw memories. In general, this requires address computations. Some of these computations can be done at compile time while others must be done at run time. arrays Arrays are interesting because the address of an array element must in general be computed at run time, since the array index may change. Let us first consider one-dimensional arrays:
a[i]
The layout of the array in memory is shown in Figure 5-14 above. The zeroth element is stored as the first element of the array, the first element directly below, and so on. We can create a pointer for the array that points to the array's head, namely, a[0]. If we call that pointer aptr for convenience, then we can rewrite the reading of a [i] as
*(aptr + i)
Two-dimensional arrays are more challenging. There are multiple possible ways to lay out a two-dimensional array in memory, as shown in Figure 5-15 below. In this form, which is known as row major, the inner variable of the array (j in a[i,j]) varies most quickly. (Fortran uses a different organization known as column major.)
Two-dimensional arrays also require more sophisticated addressing - in particular, we must know the size of the array. Let us consider the row-major form. If the a[] array is of size N × M, then we can turn the two-dimensional array access into a one-dimensional array access. Thus,
a[i,j]
becomes
a[i*M + j]
where the maximum value for j is M ' 1.
Figure 5-15. Memory layout for two-dimensional arrays
A C struct is easier to address. As shown in Figure 5-16 below, a structure is implemented as a contiguous block of memory. Fields in the structure can be accessed using constant offsets to the base address of the structure. In this example, if field1 is four bytes long, then field2 can be accessed as
*(aptr + 4)
This addition can usually be done at compile time, requiring only the indirection itself to fetch the memory location during execution.
Expression Simplification
Expression simplification is a useful area for machine-independent transformations. We can use the laws of algebra to simplify expressions. Consider the following expression:
a*b + a*c
We can use the distributive law to rewrite the expression as
a*(b + c)
Figure 5-16. C structure layout and access.
Since the new expression has only two operations rather than three for the original form, it is almost certainly cheaper, because it is both faster and smaller. Such transformations make some broad assumptions about the relative cost of operations.
In some cases, simple generalizations about the cost of operations may be misleading. For example, a CPU with a multiply-andaccumulate instruction may be able to do a multiply and addition as cheaply as it can do an addition. However, such situations can often be taken care of in code generation.
We can also use the laws of arithmetic to further simplify expressions on constants. Consider the following C statement:
for (i = 0; i < 8 + 1; i++)
We can simplify 8 + 1 to 9 at compile time - there is no need to perform that arithmetic while the program is executing. Why would a program ever contain expressions that evaluate to constants? Using named constants rather than numbers is good programming practice and often leads to constant expression. The original form of the for statement could have been
for (i = 0; i < NOPS + 1; i++)
where, for example, the added 1 takes care of a trailing null character.
Dead Code Elimination
Code that will never be executed can be safely removed from the program. The general problem of identifying code that will never be executed is difficult, but there are some important special cases where it can be done. Programmers will intentionally introduce dead code in certain situations.
Consider this C code fragment:
#define DEBUG 0 ... if (DEBUG) print_debug_stuff();
In the above case, the print_debug_stuff() function is never executed, but the code allows the programmer to override the preprocessor variable definition (perhaps with a compile-time flag) to enable the debugging code.
This case is easy to analyze because the condition is the constant 0, which C uses for the false condition. Since there is no else clause in the if statement, the compiler can totally eliminate the if statement, rewriting the CDFG to provide a direct edge between the statements before and after the if.
Some dead code may be introduced by the compiler. For example, certain optimizations introduce copy statements that copy one variable to another. If uses of the first variable can be replaced by references to the second one, then the copy statement becomes dead code that can be eliminated.
Procedure Inlining
Another machine-independent transformation that requires a little more evaluation is procedure inlining. An inlined procedure does not have a separate procedure body and procedure linkage; rather, the body of the procedure is substituted in place for the procedure call. Figure 5-17 below shows an example of function inlining in C.
The C++ programming language provides an inline construct that tells the compiler to generate inline code for a function. In this case, an inlined procedure is generated in expanded form whenever possible. However, inlining is not always the best thing to do. It does eliminate the procedure linkage instructions. However, in a cached system, having multiple copies of the function body may actually slow down the fetches of these instructions. Inlining also increases code size, and memory may be precious.
Function definition:
int foo(a,b,c) { return a + b " c; }
Function call:
z = foo(w,x,y,);
Inlining result:
z = w + x " y;
Figure 5-17. Function inlining in C.
Loop Transformations
Loops are important program structures—although they are compactly described in the source code, they often use a large fraction of the computation time. Many techniques have been designed to optimize loops.
A simple but useful transformation is known as loop unrolling, which is illustrated in Example 5-4. Loop unrolling is important because it helps expose parallelism that can be used by later stages of the compiler.
Example 5-4 Loop Unrolling
A simple loop in C follows:
for (i = 0; i < N; i++) { a[i]=b[i]*c[i]; }
This loop is executed a fixed number of times, namely, N. A straightforward implementation of the loop would create and initialize the loop variable i, update its value on every iteration, and test it to see whether to exit the loop. However, since the loop is executed a fixed number of times, we can generate more direct code.
If we let N = 4, then we can substitute the above C code for the following loop:
a[0] = b[0]*c[0]; a[1] = b[1]*c[1]; a[2] = b[2]*c[2]; a[3] = b[3]*c[3];
This unrolled code has no loop overhead code at all, that is, no iteration variable and no tests. But the unrolled loop has the same problems as the inlined procedure—it may interfere with the cache and expands the amount of code required.
We do not, of course, have to fully unroll loops. Rather than unroll the above loop four times, we could unroll it twice. The following code results:
for (i = 0; i < 2; i++) { a[i*2] = b[i*2]*c[i*2]; a[i*2 + 1] = b[i*2 + 1]*c[i*2 + 1]; }
In this case, since all operations in the two lines of the loop body are independent, later stages of the compiler may be able to generate code that allows them to be executed efficiently on the CPU's pipeline.
Loop fusion combines two or more loops into a single loop. For this transformation to be legal, two conditions must be satisfied. First, the loops must iterate over the same values. Second, the loop bodies must not have dependencies that would be violated if they are executed together—for example, if the second loop's ith iteration depends on the results of the i + 1th iteration of the first loop, the two loops cannot be combined. Loop distribution is the opposite of loop fusion, that is, decomposing a single loop into multiple loops.
Loop tiling breaks up a loop into a set of nested loops, with each inner loop performing the operations on a subset of the data. An example is shown in Figure 5-18. Here, each loop is broken up into tiles of size two. Each loop is split into two loops—for example, the inner ii loop iterates within the tile and the outer i loop iterates across the tiles.
The result is that the pattern of accesses across the a array is drastically different - rather than walking across one row in its entirety, the code walks through rows and columns following the tile structure. Loop tiling changes the order in which array elements are accessed, thereby allowing us to better control the behavior of the cache during loop execution.
Figure 5-18. Loop tiling.
We can also modify the arrays being indexed in loops. Array padding adds dummy data elements to a loop in order to change the layout of the array in the cache. Although these array locations will not be used, they do change how the useful array elements fall into cache lines. Judicious padding can in some cases significantly reduce the number of cache conflicts during loop execution.
Next in Part 4: The creation of procedures
To read Part 1, go to "Program design and analysis."
To read Part 2 , go to " Models of program, assemblers and linkers."
Used with the permission of the publisher, Newnes/Elsevier, this series of six articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.
Aug 7 2007 (0:15 AM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=201203685
It is useful to understand how a high-level language program is translated into instructions. First, since implementing an embedded computing system often requires controlling the instruction sequences used to handle interrupts, placement of data and instructions in memory, and so forth, understanding how the compiler works can help you know when you cannot rely on the compiler.
Next, because many applications are also performance sensitive, understanding how code is generated can help you meet your performance goals, either by writing high-level code that gets compiled into the instructions you want or by recognizing when you must write your own assembly code.
Compilation = translation + optimization. Compilation combines translation and optimization. The high-level language program is translated into the lower-level form of instructions; optimizations try to generate better instruction sequences than would be possible if the brute force technique of independently translating source code statements were used.
Optimization techniques focus on more of the program to ensure that compilation decisions that appear to be good for one statement are not unnecessarily problematic for other parts of the program.
The compilation process is summarized in Figure 5-12 below. Compilation begins with high-level language code such as C and generally produces assembly code. (Directly producing object code simply duplicates the functions of an assembler, which is a very desirable stand-alone program to have.)
The highlevel language program is parsed to break it into statements and expressions. In addition, a symbol table is generated, which includes all the named objects in the program. Some compilers may then perform higher-level optimizations that can be viewed as modifying the high-level language program input without reference to instructions.
Figure 5-12. The compilation process.
Simplifying arithmetic expressions is one example of a machineindependent optimization. Not all compilers do such optimizations, and compilers can vary widely regarding which combinations of machine-independent optimizations they do perform. Instruction-level optimizations are aimed at generating code.
They may work directly on real instructions or on a pseudoinstruction format that is later mapped onto the instructions of the target CPU. This level of optimization also helps modularize the compiler by allowing code generation to create simpler code that is later optimized. For example, consider the following array access code:
x[i] = c*x[i];
A simple code generator would generate the address for x[i] twice, once for each appearance in the statement. The later optimization phases can recognize this as an example of common expressions that need not be duplicated.
While in this simple case it would be possible to create a code generator that never generated the redundant expression, taking into account every such optimization at code generation time is very difficult. We get better code and more reliable compilers by generating simple code first and then optimizing it.
Statement Translation
In this section, we consider the basic job of translating the high-level language program with little or no optimization. Let's first consider how to translate an expression.
A large amount of the code in a typical application consists of arithmetic and logical expressions. Understanding how to compile a single expression, as described in Example 5-2 below, is a good first step in understanding the entire compilation process.
Example 5-2 Compiling an Arithmetic Expression
In the following arithmetic expression,
a*b + 5*(c ' d)
the variable is written in terms of program variables. In some machines we may be able to perform memory-to-memory arithmetic directly on the locations corresponding to those variables. However, in many machines, such as the ARM, we must first load the variables into registers. This requires choosing which registers receive not only the named variables but also intermediate results such as (c ' d).
The code for the expression can be built by walking the data flow graph. The data flow graph for the expression appears below.
The temporary variables for the intermediate values and final result have been named w, x, y, and z. To generate code, we walk from the tree's root (where z, the final result, is generated) by traversing the nodes in post order. During the walk, we generate instructions to cover the operation at every node. The path is presented below.
The nodes are numbered in the order in which code is generated. Since every node in the data flow graph corresponds to an operation that is directly supported by the instruction set, we simply generate an instruction at every node. Since we are making an arbitrary register assignment, we can use up the registers in order starting with r1. The resulting ARM code follows:
; operator 1 (+) ADR r4,a ; get address for a MOV r1,[r4] ; load a ADR r4,b ; get address for b MOV r2,[r4] ; load b ADD r3,r1,r2 ; put w into r3 ; operator 2 (') ADR r4,c ; get address for c MOV r4,[r4] ; load c ADR r4,d ; get address for d MOV r5,[r4] ; load d SUB r6,r4,r5 ; put x into r6 ; operator 3 (*) MUL r7,r6,#5 ; operator 3, puts y into r7 ; operator 4 (+) ADD r8,r7,r3 ; operator 4, puts z into r8
The equivalent SHARC code appears below.
! operator 1 (+) R1 = DM(a); ! load a R2 = DM(b); ! load b R3 = R1 + R2; ! compute a + b ! operator 2 (') R4 = DM(c); ! load c R5 = DM(d); ! load d R6 = R4 ' R5; ! compute c ' d ! operator 3 (*) R7 = 5; R8 = R6 * R7; ! compute y ! operator 4 (+) R9 = R3 + R8; ! compute z
One obvious optimization is to reuse a register whose value is no longer needed. In the case of the intermediate values w, x, and y, we know that they cannot be used after the end of the expression (e.g., in another expression) since they have no name in the C program. However, the final result z may in fact be used in a C assignment and the value reused later in the program. In this case we would need to know when the register is no longer needed to determine its best use.
Figure 5-13. Flow of control in C and control flow diagrams
In the previous example, we made an arbitrary allocation of variables to registers for simplicity. When we have large programs with multiple expressions, we must allocate registers more carefully since CPUs have a limited number of registers. We will consider register allocation later.
We also need to be able to translate control structures. Since conditionals are controlled by expressions, the code generation techniques of the last example can be used for those expressions, leaving us with the task of generating code for the flow of control itself.
Figure 5-13 above shows a simple example of changing flow of control in C - an if statement, in which the condition controls whether the true or false branch of the if is taken. Figure 5-13 also shows the control flow diagram for the if statement. Example 5-3 illustrates how to implement conditionals in assembly language.
Example 5-3 Generating Code for a Conditional
Consider the following C statement:
if (a + b > 0) x = 5;else x = 7;
The CDFG for the statement appears below.
We know how to generate the code for the expressions. We can generate the control flow code by walking the CDFG. One ordered walk through the CDFG follows:
To generate code, we must assign a label to the first instruction at the end of a directed edge and create a branch for each edge that does not go to the immediately following instruction. The exact steps to be taken at the branch points depend on the target architecture.
On some machines, evaluating expressions generates condition codes that we can test in subsequent branches, and on other machines we must use test-and-branch instructions. ARM allows us to test condition codes, so we get the following ARM code for the 1-2-3 walk:
ADR r5,a ; get address for a LDR r1,[r5] ; load a ADR r5,b ; get address for b LDR r2,b ; load b ADD r3,r1,r2 BLE label3 ; true condition falls through branch ; true case LDR r3,#5 ; load constant ADR r5,x STR r3, [r5] ; store value into x B stmtend ; done with the true case ; false case label3 LDR r3,#7 ; load constant ADR r5,x ; get address of x STR r3,[r5] ; store value into x stmtend ...
The 1-2 and 3-4 edges do not require a branch and label because they are straight-line code. In contrast, the 1-3 and 2-4 edges do require a branch and a label for the target.
Since expressions are generally created as straight-line code, they typically require careful consideration of the order in which the operations are executed. We have much more freedom when generating conditional code because the branches ensure that the flow of control goes to the right block of code. If we walk the CDFG in a different order and lay out the code blocks in a different order in memory, we still get valid code as long as we properly place branches.
Drawing a control flow graph based on the while form of the loop helps us understand how to translate it into instructions.
C compilers can generate (using the -s flag) assembler source, which some compilers intersperse with the C code. Such code is a very good way to learn about both assembly language programming and compilation. Web Aid 5-Assembly has examples of assembly output from C compilers for the ARM and SHARC.
Procedures
Another major code generation problem is the creation of procedures. Generating code for procedures is relatively straightforward once we know the procedure linkage appropriate for the CPU. At the procedure definition, we generate the code to handle the procedure call and return. At each call of the procedure, we set up the procedure parameters and make the call.
The CPU's subroutine call mechanism is usually not sufficient to directly support procedures in modern programming languages. The procedure linkage mechanism provides a way for the program to pass parameters into the program and for the procedure to return a value.
It also provides help in restoring the values of registers that the procedure has modified. All procedures in a given programming language use the same linkage mechanism (although different languages may use different linkages). The mechanism can also be used to call handwritten assembly language routines from compiled code.
Procedure stacks are typically built to grow down from high addresses. A stack pointer (sp) defines the end of the current frame, while a frame pointer (fp) defines the end of the last frame. (The fp is technically necessary only if the stack frame can be grown by the procedure during execution.)
The procedure can refer to an element in the frame by addressing relative to sp. When a new procedure is called, the sp and fp are modified to push another frame onto the stack.
The ARM Procedure Call Standard (APCS) is a good illustration of a typical procedure linkage mechanism. Although the stack frames are in main memory, understanding how registers are used is key to understanding the mechanism, as explained below.
r0"r3 are used to pass parameters into the procedure. r0 is also used to hold the return value. If more than four parameters are required, they are put on the stack frame.
r4"r7 hold register variables.
r11 is the frame pointer and r13 is the stack pointer.
r10 holds the limiting address on stack size, which is used to check for stack overflows.
Other registers have additional uses in the protocol.
Figure 5-14. Layout of a one-dimensional array in memory
Data Structures
The compiler must also translate references to data structures into references to raw memories. In general, this requires address computations. Some of these computations can be done at compile time while others must be done at run time. arrays Arrays are interesting because the address of an array element must in general be computed at run time, since the array index may change. Let us first consider one-dimensional arrays:
a[i]
The layout of the array in memory is shown in Figure 5-14 above. The zeroth element is stored as the first element of the array, the first element directly below, and so on. We can create a pointer for the array that points to the array's head, namely, a[0]. If we call that pointer aptr for convenience, then we can rewrite the reading of a [i] as
*(aptr + i)
Two-dimensional arrays are more challenging. There are multiple possible ways to lay out a two-dimensional array in memory, as shown in Figure 5-15 below. In this form, which is known as row major, the inner variable of the array (j in a[i,j]) varies most quickly. (Fortran uses a different organization known as column major.)
Two-dimensional arrays also require more sophisticated addressing - in particular, we must know the size of the array. Let us consider the row-major form. If the a[] array is of size N × M, then we can turn the two-dimensional array access into a one-dimensional array access. Thus,
a[i,j]
becomes
a[i*M + j]
where the maximum value for j is M ' 1.
Figure 5-15. Memory layout for two-dimensional arrays
A C struct is easier to address. As shown in Figure 5-16 below, a structure is implemented as a contiguous block of memory. Fields in the structure can be accessed using constant offsets to the base address of the structure. In this example, if field1 is four bytes long, then field2 can be accessed as
*(aptr + 4)
This addition can usually be done at compile time, requiring only the indirection itself to fetch the memory location during execution.
Expression Simplification
Expression simplification is a useful area for machine-independent transformations. We can use the laws of algebra to simplify expressions. Consider the following expression:
a*b + a*c
We can use the distributive law to rewrite the expression as
a*(b + c)
Figure 5-16. C structure layout and access.
Since the new expression has only two operations rather than three for the original form, it is almost certainly cheaper, because it is both faster and smaller. Such transformations make some broad assumptions about the relative cost of operations.
In some cases, simple generalizations about the cost of operations may be misleading. For example, a CPU with a multiply-andaccumulate instruction may be able to do a multiply and addition as cheaply as it can do an addition. However, such situations can often be taken care of in code generation.
We can also use the laws of arithmetic to further simplify expressions on constants. Consider the following C statement:
for (i = 0; i < 8 + 1; i++)
We can simplify 8 + 1 to 9 at compile time - there is no need to perform that arithmetic while the program is executing. Why would a program ever contain expressions that evaluate to constants? Using named constants rather than numbers is good programming practice and often leads to constant expression. The original form of the for statement could have been
for (i = 0; i < NOPS + 1; i++)
where, for example, the added 1 takes care of a trailing null character.
Dead Code Elimination
Code that will never be executed can be safely removed from the program. The general problem of identifying code that will never be executed is difficult, but there are some important special cases where it can be done. Programmers will intentionally introduce dead code in certain situations.
Consider this C code fragment:
#define DEBUG 0 ... if (DEBUG) print_debug_stuff();
In the above case, the print_debug_stuff() function is never executed, but the code allows the programmer to override the preprocessor variable definition (perhaps with a compile-time flag) to enable the debugging code.
This case is easy to analyze because the condition is the constant 0, which C uses for the false condition. Since there is no else clause in the if statement, the compiler can totally eliminate the if statement, rewriting the CDFG to provide a direct edge between the statements before and after the if.
Some dead code may be introduced by the compiler. For example, certain optimizations introduce copy statements that copy one variable to another. If uses of the first variable can be replaced by references to the second one, then the copy statement becomes dead code that can be eliminated.
Procedure Inlining
Another machine-independent transformation that requires a little more evaluation is procedure inlining. An inlined procedure does not have a separate procedure body and procedure linkage; rather, the body of the procedure is substituted in place for the procedure call. Figure 5-17 below shows an example of function inlining in C.
The C++ programming language provides an inline construct that tells the compiler to generate inline code for a function. In this case, an inlined procedure is generated in expanded form whenever possible. However, inlining is not always the best thing to do. It does eliminate the procedure linkage instructions. However, in a cached system, having multiple copies of the function body may actually slow down the fetches of these instructions. Inlining also increases code size, and memory may be precious.
Function definition:
int foo(a,b,c) { return a + b " c; }
Function call:
z = foo(w,x,y,);
Inlining result:
z = w + x " y;
Figure 5-17. Function inlining in C.
Loop Transformations
Loops are important program structures—although they are compactly described in the source code, they often use a large fraction of the computation time. Many techniques have been designed to optimize loops.
A simple but useful transformation is known as loop unrolling, which is illustrated in Example 5-4. Loop unrolling is important because it helps expose parallelism that can be used by later stages of the compiler.
Example 5-4 Loop Unrolling
A simple loop in C follows:
for (i = 0; i < N; i++) { a[i]=b[i]*c[i]; }
This loop is executed a fixed number of times, namely, N. A straightforward implementation of the loop would create and initialize the loop variable i, update its value on every iteration, and test it to see whether to exit the loop. However, since the loop is executed a fixed number of times, we can generate more direct code.
If we let N = 4, then we can substitute the above C code for the following loop:
a[0] = b[0]*c[0]; a[1] = b[1]*c[1]; a[2] = b[2]*c[2]; a[3] = b[3]*c[3];
This unrolled code has no loop overhead code at all, that is, no iteration variable and no tests. But the unrolled loop has the same problems as the inlined procedure—it may interfere with the cache and expands the amount of code required.
We do not, of course, have to fully unroll loops. Rather than unroll the above loop four times, we could unroll it twice. The following code results:
for (i = 0; i < 2; i++) { a[i*2] = b[i*2]*c[i*2]; a[i*2 + 1] = b[i*2 + 1]*c[i*2 + 1]; }
In this case, since all operations in the two lines of the loop body are independent, later stages of the compiler may be able to generate code that allows them to be executed efficiently on the CPU's pipeline.
Loop fusion combines two or more loops into a single loop. For this transformation to be legal, two conditions must be satisfied. First, the loops must iterate over the same values. Second, the loop bodies must not have dependencies that would be violated if they are executed together—for example, if the second loop's ith iteration depends on the results of the i + 1th iteration of the first loop, the two loops cannot be combined. Loop distribution is the opposite of loop fusion, that is, decomposing a single loop into multiple loops.
Loop tiling breaks up a loop into a set of nested loops, with each inner loop performing the operations on a subset of the data. An example is shown in Figure 5-18. Here, each loop is broken up into tiles of size two. Each loop is split into two loops—for example, the inner ii loop iterates within the tile and the outer i loop iterates across the tiles.
The result is that the pattern of accesses across the a array is drastically different - rather than walking across one row in its entirety, the code walks through rows and columns following the tile structure. Loop tiling changes the order in which array elements are accessed, thereby allowing us to better control the behavior of the cache during loop execution.
Figure 5-18. Loop tiling.
We can also modify the arrays being indexed in loops. Array padding adds dummy data elements to a loop in order to change the layout of the array in the cache. Although these array locations will not be used, they do change how the useful array elements fall into cache lines. Judicious padding can in some cases significantly reduce the number of cache conflicts during loop execution.
Next in Part 4: The creation of procedures
To read Part 1, go to "Program design and analysis."
To read Part 2 , go to " Models of program, assemblers and linkers."
Used with the permission of the publisher, Newnes/Elsevier, this series of six articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.
The basics of programming embedded processors: Part 4
By Wayne Wolf , Embedded.com
Aug 13 2007 (3:00 AM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=201500002
Another major code generation problem is the creation of procedures. Generating code for procedures is relatively straightforward once we know the procedure linkage appropriate for the CPU. At the procedure definition, we generate the code to handle the procedure call and return. At each call of the procedure, we set up the procedure parameters and make the call.
The CPU's subroutine call mechanism is usually not sufficient to directly support procedures in modern programming languages. The procedure linkage mechanism provides a way for the program to pass parameters into the program and for the procedure to return a value.
It also provides help in restoring the values of registers that the procedure has modified. All procedures in a given programming language use the same linkage mechanism (although different languages may use different linkages). The mechanism can also be used to call handwritten assembly language routines from compiled code.
Procedure stacks are typically built to grow down from high addresses. A stack pointer (sp) defines the end of the current frame, while a frame pointer (fp) defines the end of the last frame. (The fp is technically necessary only if the stack frame can be grown by the procedure during execution.)
The procedure can refer to an element in the frame by addressing relative to sp. When a new procedure is called, the sp and fp are modified to push another frame onto the stack.
The ARM Procedure Call Standard (APCS) is a good illustration of a typical procedure linkage mechanism. Although the stack frames are in main memory, understanding how registers are used is key to understanding the mechanism, as explained below.
r0"r3 are used to pass parameters into the procedure. r0 is also used to hold the return value. If more than four parameters are required, they are put on the stack frame.
r4"r7 hold register variables.
r11 is the frame pointer and r13 is the stack pointer.
r10 holds the limiting address on stack size, which is used to check for stack overflows.
Other registers have additional uses in the protocol.
Figure 5-14. Layout of a one-dimensional array in memory
Data Structures
The compiler must also translate references to data structures into references to raw memories. In general, this requires address computations. Some of these computations can be done at compile time while others must be done at run time. arrays Arrays are interesting because the address of an array element must in general be computed at run time, since the array index may change. Let us first consider one-dimensional arrays:
a[i]
The layout of the array in memory is shown in Figure 5-14 above. The zeroth element is stored as the first element of the array, the first element directly below, and so on. We can create a pointer for the array that points to the array's head, namely, a[0]. If we call that pointer aptr for convenience, then we can rewrite the reading of a [i] as
*(aptr + i)
Two-dimensional arrays are more challenging. There are multiple possible ways to lay out a two-dimensional array in memory, as shown in Figure 5-15 below. In this form, which is known as row major, the inner variable of the array (j in a[i,j]) varies most quickly. (Fortran uses a different organization known as column major.)
Two-dimensional arrays also require more sophisticated addressing - in particular, we must know the size of the array. Let us consider the row-major form. If the a[] array is of size N × M, then we can turn the two-dimensional array access into a one-dimensional array access. Thus,
a[i,j]
becomes
a[i*M + j]
where the maximum value for j is M ' 1.
Figure 5-15. Memory layout for two-dimensional arrays
A C struct is easier to address. As shown in Figure 5-16 below, a structure is implemented as a contiguous block of memory. Fields in the structure can be accessed using constant offsets to the base address of the structure. In this example, if field1 is four bytes long, then field2 can be accessed as
*(aptr + 4)
This addition can usually be done at compile time, requiring only the indirection itself to fetch the memory location during execution.
Expression Simplification
Expression simplification is a useful area for machine-independent transformations. We can use the laws of algebra to simplify expressions. Consider the following expression:
a*b + a*c
We can use the distributive law to rewrite the expression as
a*(b + c)
Figure 5-16. C structure layout and access.
Since the new expression has only two operations rather than three for the original form, it is almost certainly cheaper, because it is both faster and smaller. Such transformations make some broad assumptions about the relative cost of operations.
In some cases, simple generalizations about the cost of operations may be misleading. For example, a CPU with a multiply-andaccumulate instruction may be able to do a multiply and addition as cheaply as it can do an addition. However, such situations can often be taken care of in code generation.
We can also use the laws of arithmetic to further simplify expressions on constants. Consider the following C statement:
for (i = 0; i < 8 + 1; i++)
We can simplify 8 + 1 to 9 at compile time - there is no need to perform that arithmetic while the program is executing. Why would a program ever contain expressions that evaluate to constants? Using named constants rather than numbers is good programming practice and often leads to constant expression. The original form of the for statement could have been
for (i = 0; i < NOPS + 1; i++)
where, for example, the added 1 takes care of a trailing null character.
Dead Code Elimination
Code that will never be executed can be safely removed from the program. The general problem of identifying code that will never be executed is difficult, but there are some important special cases where it can be done. Programmers will intentionally introduce dead code in certain situations.
Consider this C code fragment:
#define DEBUG 0 ... if (DEBUG) print_debug_stuff();
In the above case, the print_debug_stuff() function is never executed, but the code allows the programmer to override the preprocessor variable definition (perhaps with a compile-time flag) to enable the debugging code.
This case is easy to analyze because the condition is the constant 0, which C uses for the false condition. Since there is no else clause in the if statement, the compiler can totally eliminate the if statement, rewriting the CDFG to provide a direct edge between the statements before and after the if.
Some dead code may be introduced by the compiler. For example, certain optimizations introduce copy statements that copy one variable to another. If uses of the first variable can be replaced by references to the second one, then the copy statement becomes dead code that can be eliminated.
Procedure Inlining
Another machine-independent transformation that requires a little more evaluation is procedure inlining. An inlined procedure does not have a separate procedure body and procedure linkage; rather, the body of the procedure is substituted in place for the procedure call. Figure 5-17 below shows an example of function inlining in C.
The C++ programming language provides an inline construct that tells the compiler to generate inline code for a function. In this case, an inlined procedure is generated in expanded form whenever possible. However, inlining is not always the best thing to do. It does eliminate the procedure linkage instructions. However, in a cached system, having multiple copies of the function body may actually slow down the fetches of these instructions. Inlining also increases code size, and memory may be precious.
Function definition:
int foo(a,b,c) { return a + b " c; }
Function call:
z = foo(w,x,y,);
Inlining result:
z = w + x " y;
Figure 5-17. Function inlining in C.
Loop Transformations
Loops are important program structures - although they are compactly described in the source code, they often use a large fraction of the computation time. Many techniques have been designed to optimize loops.
A simple but useful transformation is known as loop unrolling, which is illustrated in Example 5-4 below. Loop unrolling is important because it helps expose parallelism that can be used by later stages of the compiler.
Example 5-4 Loop Unrolling
A simple loop in C follows:
for (i = 0; i < N; i++) { a[i]=b[i]*c[i]; }
This loop is executed a fixed number of times, namely, N. A straightforward implementation of the loop would create and initialize the loop variable i, update its value on every iteration, and test it to see whether to exit the loop. However, since the loop is executed a fixed number of times, we can generate more direct code.
If we let N = 4, then we can substitute the above C code for the following loop:
a[0] = b[0]*c[0]; a[1] = b[1]*c[1]; a[2] = b[2]*c[2]; a[3] = b[3]*c[3];
This unrolled code has no loop overhead code at all, that is, no iteration variable and no tests. But the unrolled loop has the same problems as the inlined procedure—it may interfere with the cache and expands the amount of code required.
We do not, of course, have to fully unroll loops. Rather than unroll the above loop four times, we could unroll it twice. The following code results:
for (i = 0; i < 2; i++) { a[i*2] = b[i*2]*c[i*2]; a[i*2 + 1] = b[i*2 + 1]*c[i*2 + 1]; }
In this case, since all operations in the two lines of the loop body are independent, later stages of the compiler may be able to generate code that allows them to be executed efficiently on the CPU's pipeline.
Loop fusion combines two or more loops into a single loop. For this transformation to be legal, two conditions must be satisfied. First, the loops must iterate over the same values. Second, the loop bodies must not have dependencies that would be violated if they are executed together - for example, if the second loop's ith iteration depends on the results of the i + 1th iteration of the first loop, the two loops cannot be combined. Loop distribution is the opposite of loop fusion, that is, decomposing a single loop into multiple loops.
Loop tiling breaks up a loop into a set of nested loops, with each inner loop performing the operations on a subset of the data. An example is shown in Figure 5-18 below. Here, each loop is broken up into tiles of size two. Each loop is split into two loops - for example, the inner ii loop iterates within the tile and the outer i loop iterates across the tiles.
The result is that the pattern of accesses across the a array is drastically different - rather than walking across one row in its entirety, the code walks through rows and columns following the tile structure. Loop tiling changes the order in which array elements are accessed, thereby allowing us to better control the behavior of the cache during loop execution.
Figure 5-18. Loop tiling.
We can also modify the arrays being indexed in loops. Array padding adds dummy data elements to a loop in order to change the layout of the array in the cache. Although these array locations will not be used, they do change how the useful array elements fall into cache lines. Judicious padding can in some cases significantly reduce the number of cache conflicts during loop execution.
Next in Part 5: Register allocation and scheduling
To read Part 1, go to "Program design and analysis."
To read Part 2 , go to " Models of program, assemblers and linkers."
To read Part 3, go to "Basic compilation techniques"
Used with the permission of the publisher, Newnes/Elsevier, this series of six articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.
Aug 13 2007 (3:00 AM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=201500002
Another major code generation problem is the creation of procedures. Generating code for procedures is relatively straightforward once we know the procedure linkage appropriate for the CPU. At the procedure definition, we generate the code to handle the procedure call and return. At each call of the procedure, we set up the procedure parameters and make the call.
The CPU's subroutine call mechanism is usually not sufficient to directly support procedures in modern programming languages. The procedure linkage mechanism provides a way for the program to pass parameters into the program and for the procedure to return a value.
It also provides help in restoring the values of registers that the procedure has modified. All procedures in a given programming language use the same linkage mechanism (although different languages may use different linkages). The mechanism can also be used to call handwritten assembly language routines from compiled code.
Procedure stacks are typically built to grow down from high addresses. A stack pointer (sp) defines the end of the current frame, while a frame pointer (fp) defines the end of the last frame. (The fp is technically necessary only if the stack frame can be grown by the procedure during execution.)
The procedure can refer to an element in the frame by addressing relative to sp. When a new procedure is called, the sp and fp are modified to push another frame onto the stack.
The ARM Procedure Call Standard (APCS) is a good illustration of a typical procedure linkage mechanism. Although the stack frames are in main memory, understanding how registers are used is key to understanding the mechanism, as explained below.
r0"r3 are used to pass parameters into the procedure. r0 is also used to hold the return value. If more than four parameters are required, they are put on the stack frame.
r4"r7 hold register variables.
r11 is the frame pointer and r13 is the stack pointer.
r10 holds the limiting address on stack size, which is used to check for stack overflows.
Other registers have additional uses in the protocol.
Figure 5-14. Layout of a one-dimensional array in memory
Data Structures
The compiler must also translate references to data structures into references to raw memories. In general, this requires address computations. Some of these computations can be done at compile time while others must be done at run time. arrays Arrays are interesting because the address of an array element must in general be computed at run time, since the array index may change. Let us first consider one-dimensional arrays:
a[i]
The layout of the array in memory is shown in Figure 5-14 above. The zeroth element is stored as the first element of the array, the first element directly below, and so on. We can create a pointer for the array that points to the array's head, namely, a[0]. If we call that pointer aptr for convenience, then we can rewrite the reading of a [i] as
*(aptr + i)
Two-dimensional arrays are more challenging. There are multiple possible ways to lay out a two-dimensional array in memory, as shown in Figure 5-15 below. In this form, which is known as row major, the inner variable of the array (j in a[i,j]) varies most quickly. (Fortran uses a different organization known as column major.)
Two-dimensional arrays also require more sophisticated addressing - in particular, we must know the size of the array. Let us consider the row-major form. If the a[] array is of size N × M, then we can turn the two-dimensional array access into a one-dimensional array access. Thus,
a[i,j]
becomes
a[i*M + j]
where the maximum value for j is M ' 1.
Figure 5-15. Memory layout for two-dimensional arrays
A C struct is easier to address. As shown in Figure 5-16 below, a structure is implemented as a contiguous block of memory. Fields in the structure can be accessed using constant offsets to the base address of the structure. In this example, if field1 is four bytes long, then field2 can be accessed as
*(aptr + 4)
This addition can usually be done at compile time, requiring only the indirection itself to fetch the memory location during execution.
Expression Simplification
Expression simplification is a useful area for machine-independent transformations. We can use the laws of algebra to simplify expressions. Consider the following expression:
a*b + a*c
We can use the distributive law to rewrite the expression as
a*(b + c)
Figure 5-16. C structure layout and access.
Since the new expression has only two operations rather than three for the original form, it is almost certainly cheaper, because it is both faster and smaller. Such transformations make some broad assumptions about the relative cost of operations.
In some cases, simple generalizations about the cost of operations may be misleading. For example, a CPU with a multiply-andaccumulate instruction may be able to do a multiply and addition as cheaply as it can do an addition. However, such situations can often be taken care of in code generation.
We can also use the laws of arithmetic to further simplify expressions on constants. Consider the following C statement:
for (i = 0; i < 8 + 1; i++)
We can simplify 8 + 1 to 9 at compile time - there is no need to perform that arithmetic while the program is executing. Why would a program ever contain expressions that evaluate to constants? Using named constants rather than numbers is good programming practice and often leads to constant expression. The original form of the for statement could have been
for (i = 0; i < NOPS + 1; i++)
where, for example, the added 1 takes care of a trailing null character.
Dead Code Elimination
Code that will never be executed can be safely removed from the program. The general problem of identifying code that will never be executed is difficult, but there are some important special cases where it can be done. Programmers will intentionally introduce dead code in certain situations.
Consider this C code fragment:
#define DEBUG 0 ... if (DEBUG) print_debug_stuff();
In the above case, the print_debug_stuff() function is never executed, but the code allows the programmer to override the preprocessor variable definition (perhaps with a compile-time flag) to enable the debugging code.
This case is easy to analyze because the condition is the constant 0, which C uses for the false condition. Since there is no else clause in the if statement, the compiler can totally eliminate the if statement, rewriting the CDFG to provide a direct edge between the statements before and after the if.
Some dead code may be introduced by the compiler. For example, certain optimizations introduce copy statements that copy one variable to another. If uses of the first variable can be replaced by references to the second one, then the copy statement becomes dead code that can be eliminated.
Procedure Inlining
Another machine-independent transformation that requires a little more evaluation is procedure inlining. An inlined procedure does not have a separate procedure body and procedure linkage; rather, the body of the procedure is substituted in place for the procedure call. Figure 5-17 below shows an example of function inlining in C.
The C++ programming language provides an inline construct that tells the compiler to generate inline code for a function. In this case, an inlined procedure is generated in expanded form whenever possible. However, inlining is not always the best thing to do. It does eliminate the procedure linkage instructions. However, in a cached system, having multiple copies of the function body may actually slow down the fetches of these instructions. Inlining also increases code size, and memory may be precious.
Function definition:
int foo(a,b,c) { return a + b " c; }
Function call:
z = foo(w,x,y,);
Inlining result:
z = w + x " y;
Figure 5-17. Function inlining in C.
Loop Transformations
Loops are important program structures - although they are compactly described in the source code, they often use a large fraction of the computation time. Many techniques have been designed to optimize loops.
A simple but useful transformation is known as loop unrolling, which is illustrated in Example 5-4 below. Loop unrolling is important because it helps expose parallelism that can be used by later stages of the compiler.
Example 5-4 Loop Unrolling
A simple loop in C follows:
for (i = 0; i < N; i++) { a[i]=b[i]*c[i]; }
This loop is executed a fixed number of times, namely, N. A straightforward implementation of the loop would create and initialize the loop variable i, update its value on every iteration, and test it to see whether to exit the loop. However, since the loop is executed a fixed number of times, we can generate more direct code.
If we let N = 4, then we can substitute the above C code for the following loop:
a[0] = b[0]*c[0]; a[1] = b[1]*c[1]; a[2] = b[2]*c[2]; a[3] = b[3]*c[3];
This unrolled code has no loop overhead code at all, that is, no iteration variable and no tests. But the unrolled loop has the same problems as the inlined procedure—it may interfere with the cache and expands the amount of code required.
We do not, of course, have to fully unroll loops. Rather than unroll the above loop four times, we could unroll it twice. The following code results:
for (i = 0; i < 2; i++) { a[i*2] = b[i*2]*c[i*2]; a[i*2 + 1] = b[i*2 + 1]*c[i*2 + 1]; }
In this case, since all operations in the two lines of the loop body are independent, later stages of the compiler may be able to generate code that allows them to be executed efficiently on the CPU's pipeline.
Loop fusion combines two or more loops into a single loop. For this transformation to be legal, two conditions must be satisfied. First, the loops must iterate over the same values. Second, the loop bodies must not have dependencies that would be violated if they are executed together - for example, if the second loop's ith iteration depends on the results of the i + 1th iteration of the first loop, the two loops cannot be combined. Loop distribution is the opposite of loop fusion, that is, decomposing a single loop into multiple loops.
Loop tiling breaks up a loop into a set of nested loops, with each inner loop performing the operations on a subset of the data. An example is shown in Figure 5-18 below. Here, each loop is broken up into tiles of size two. Each loop is split into two loops - for example, the inner ii loop iterates within the tile and the outer i loop iterates across the tiles.
The result is that the pattern of accesses across the a array is drastically different - rather than walking across one row in its entirety, the code walks through rows and columns following the tile structure. Loop tiling changes the order in which array elements are accessed, thereby allowing us to better control the behavior of the cache during loop execution.
Figure 5-18. Loop tiling.
We can also modify the arrays being indexed in loops. Array padding adds dummy data elements to a loop in order to change the layout of the array in the cache. Although these array locations will not be used, they do change how the useful array elements fall into cache lines. Judicious padding can in some cases significantly reduce the number of cache conflicts during loop execution.
Next in Part 5: Register allocation and scheduling
To read Part 1, go to "Program design and analysis."
To read Part 2 , go to " Models of program, assemblers and linkers."
To read Part 3, go to "Basic compilation techniques"
Used with the permission of the publisher, Newnes/Elsevier, this series of six articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.
Tuesday, July 31, 2007
The basics of programming embedded processors: Part 2
The basics of programming embedded processors: Part 2
By Wayne Wolf, Embedded.com
Jul 31 2007 (0:15 AM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=201201938
Where previously in Part 1 we covered the basics of program design and analysis and the usefulness of design patterns, in this part in this series of articles we develop models for programs that are more general than source code.
Why not use the source code directly? First, there are many different types of source code - assembly languages, C code, and so on - but we can use a single model to describe all of them. Second, once we have such a model, we can perform many useful analyses on the model more easily than we could on the source code.
Our fundamental model for programs is the control/data flow graph (CDFG). (We can also model hardware behavior with the CDFG.) As the name implies, the CDFG has constructs that model both data operations (arithmetic and other computations) and control operations (conditionals). Part of the power of the CDFG comes from its combination of control and data constructs. To understand the CDFG, we start with pure data descriptions and then extend the model to control.
Data Flow Graphs
A data flow graph is a model of a program with no conditionals. In a high-level programming language, a code segment with no conditionals—more precisely, with only one entry and exit point—is known as a basic block. Figure 5-3 below shows a simple basic block. As the C code is executed, we would enter this basic block at the beginning and execute all the statements.
w = a + b;
x = a - c;
y = x+ d;
x = a + c;
z= y + e;
Figure 5-3. A basic block in C
Before we are able to draw the data flow graph for this code we need to modify it slightly. There are two assignments to the variable x - it appears twice on the left side of an assignment. We need to rewrite the code in single-assignment form, in which a variable appears only once on the left side.
Since our specification is C code, we assume that the statements are executed sequentially, so that any use of a variable refers to its latest assigned value. In this case, x is not reused in this block (presumably it is used elsewhere), so we just have to eliminate the multiple assignment to x . The result is shown in Figure 5-4 below, where we have used the names x1 and x2 to distinguish the separate uses of x.
w = a + b;
x1 = a - c;
y = x1 + d;
x2 = a + c;
z = y + e;
Figure 5-4. The basic block in single-assignment form.
The single-assignment form is important because it allows us to identify a unique location in the code where each named location is computed. As an introduction to the data flow graph, we use two types of nodes in the graph— round nodes denote operators and square nodes represent values. The value nodes may be either inputs to the basic block, such as a and b , or variables assigned to within the block, such as w and x1 . The data flow graph for our single-assignment code is shown in Figure 5-5 below.
The single-assignment form means that the data flow graph is acyclic—if we assigned to x multiple times, then the second assignment would form a cycle in the graph including x and the operators used to compute x .
Keeping the data flow graph acyclic is important in many types of analyses we want to do on the graph. (Of course, it is important to know whether the source code actually assigns to a variable multiple times, because some of those assignments may be mistakes. We consider the analysis of source code for proper use of assignments later in this series.)
Figure 5-5. An extended data flow graph for our sample block
The data flow graph is generally drawn in the form shown in Figure 5-6 below. Here, the variables are not explicitly represented by nodes. Instead, the edges are labeled with the variables they represent. As a result, a variable can be represented by more than one edge. However, the edges are directed and all the edges for a variable must come from a single source. We use this form for its simplicity and compactness.
Figure 5-6. Standard data flow graph for sample basic block
The data flow graph for the code makes the order in which the operations are performed in the C code much less obvious. This is one of the advantages of the data flow graph. We can use it to determine feasible reorderings of the operations, which may help us to reduce pipeline or cache conflicts.
We can also use it when the exact order of operations simply doesn't matter. The data flow graph defines a partial ordering of the operations in the basic block. We must ensure that a value is computed before it is used, but generally there are several possible orderings of evaluating expressions that satisfy this requirement.
Control/Data Flow Graphs
A CDFG uses a data flow graph as an element, adding constructs to describe control. In a basic CDFG, we have two types of nodes: decision nodes and data flow nodes.
A data flow node encapsulates a complete data flow graph to represent a basic block. We can use one type of decision node to describe all the types of control in a sequential program. (The jump/branch is, after all, the way we implement all those high-level control constructs.)
Figure 5-7 below shows a bit of C code with control constructs and the CDFG constructed from it. The rectangular nodes in the graph represent the basic blocks. The basic blocks in the C code have been represented by function calls for simplicity. The diamond-shaped nodes represent the conditionals. The node's condition is given by the label, and the edges are labeled with the possible outcomes of evaluating the condition.
Building a CDFG for a while loop is straightforward, as shown in Figure 5-8 below. The while loop consists of both a test and a loop body, each of which we know how to represent in a CDFG. We can represent for loops by remembering that, in C, a for loop is defined in terms of a while loop. The following for loop
for (i = 0; i < N; i++) {
loop_body();
}
is equivalent to
if (cond1) basic_block_1();else basic_block_2();basic_block_3();switch (test1) { case c1: basic_block_4(); break; case c2: basic_block_5(); break; case c3: basic_block_6(): break;}
C Code
Figure 5-7. C code (above) and its CDFG (below)
--------------------------------------------------------------------------------
i=0;while (i C Code
Figure 5-8. C code (above) and its CDFG (below) for a while loop.
Hierarchical representation. For a complete CDFG model, we can use a data flow graph to model each data flow node. Thus, the CDFG is a hierarchical representation—a data flow CDFG can be expanded to reveal a complete data flow graph. An execution model for a CDFG is very much like the execution of the program it represents. The CDFG does not require explicit declaration of variables, but we assume that the implementation has sufficient memory for all the variables.
We can define a state variable that represents a program counter in a CPU. (When studying a drawing of a CDFG, a finger works well for keeping track of the program counter state.) As we execute the program, we either execute the data flow node or compute the decision in the decision node and follow the appropriate edge, depending on the type of node the program counter points on.
Even though the data flow nodes may specify only a partial ordering on the data flow computations, the CDFG is a sequential representation of the program. There is only one program counter in our execution model of the CDFG, and operations are not executed in parallel.
The CDFG is not necessarily tied to high-level language control structures. We can also build a CDFG for an assembly language program. A jump instruction corresponds to a nonlocal edge in the CDFG. Some architectures, such as ARM and many VLIW processors, support predicated execution of instructions, which may be represented by special constructs in the CDFG.
Assembly and linking
Assembly and linking are the last steps in the compilation process—they turn a list of instructions into an image of the program's bits in memory. In this section, we survey the basic techniques required for assembly linking to help us understand the complete compilation process.
Figure 5-9 below highlights the role of assemblers and linkers in the compilation process. This process is often hidden from us by compilation commands that do everything required to generate an executable program. As the figure shows, most compilers do not directly generate machine code, but instead create the instruction-level program in the form of humanreadable assembly language.
Generating assembly language rather than binary instructions frees the compiler writer from details extraneous to the compilation process, which include the instruction format as well as the exact addresses of instructions and data. The assembler's job is to translate symbolic assembly language statements into bit-level representations of instructions known as object code.
The assembler takes care of instruction formats and does part of the job of translating labels into addresses. However, since the program may be built from many files, the final steps in determining the addresses of instructions and data are performed by the linker, which produces an executable binary file. That file may not necessarily be located in the CPU's memory, however, unless the linker happens to create the executable directly in RAM. The program that brings the program into memory for execution is called a loader.
Figure 5-9. Program generation from compilation through loading.
The simplest form of the assembler assumes that the starting address of the assembly language program has been specified by the programmer. The addresses in such a program are known as absolute addresses.
However, in many cases, particularly when we are creating an executable out of several component files, we do not want to specify the starting addresses for all the modules before assembly.
If we did, we would have to determine before assembly not only the length of each program in memory but also the order in which they would be linked into the program.
Most assemblers therefore allow us to use relative addresses by specifying at the start of the file that the origin of the assembly language module is to be computed later. Addresses within the module are then computed relative to the start of the module. The linker is then responsible for translating relative addresses into absolute addresses.
Assemblers. When translating assembly code into object code, the assembler must translate opcodes and format the bits in each instruction, and translate labels into addresses. In this section, we review the translation of assembly language into binary.
Labels make the assembly process more complex, but they are the most important abstraction provided by the assembler. Labels let the programmer (a human programmer or a compiler generating assembly code) avoid worrying about the absolute locations of instructions and data. Label processing requires making two passes through the assembly source code as follows:
1. The first pass scans the code to determine the address of each label.
2. The second pass assembles the instructions using the label values computed in the first pass.
As shown in Figure 5-10 below, the name of each symbol and its address is stored in a symbol table that is built during the first pass. The symbol table is built by scanning from the first instruction to the last. (For the moment, we assume that we know the absolute address of the first instruction in the program; we consider the general case later in this series).
During scanning, the current location in memory is kept in a program location counter (PLC). Despite the similarity in name to a program counter, the PLC is not used to execute the program, only to assign memory locations to labels. For example, the PLC always makes exactly one pass through the program, whereas the program counter makes many passes over code in a loop.
Thus, at the start of the first pass, the PLC is set to the program's starting address and the assembler looks at the first line. After examining the line, the assembler updates the PLC to the next location (since ARM instructions are four bytes long, the PLC would be incremented by four) and looks at the next instruction.
If the instruction begins with a label, a new entry is made in the symbol table, which includes the label name and its value. The value of the label is equal to the current value of the PLC. At the end of the first pass, the assembler rewinds to the beginning of the assembly language file to make the second pass. During the second pass, when a label name is found, the label is looked up in the symbol table and its value substituted into the appropriate place in the instruction.
Figure 5-10. Symbol table processing during assembly.
But how do we know the starting value of the PLC? The simplest case is absolute addressing. In this case, one of the first statements in the assembly language program is a pseudo-op that specifies the origin of the program, that is, the location of the first address in the program. A common name for this pseudo-op (e.g., the one used for the ARM) is the ORG statement
ORG 2000
which puts the start of the program at location 2000. This pseudo-op accomplishes this by setting the PLC's value to its argument's value, 2000 in this case. Assemblers generally allow a program to have many ORG statements in case instructions or data must be spread around various spots in memory. Example 5-1 below illustrates the use of the PLC in generating the symbol table.
Example 5-1: Generating a Symbol Table
Let's use the following simple example of ARM assembly code:
ORG 100 label1 ADR r4,c LDR r0,[r4] label2 ADR r4,d LDR r1,[r4] label3 SUB r0,r0,r1
The initial ORG statement tells us the starting address of the program. To begin, let's initialize the symbol table to an empty state and put the PLC at the initial ORG statement.
The PLC value shown is at the beginning of this step, before we have processed the ORG statement. The ORG tells us to set the PLC value to 100.
To process the next statement, we move the PLC to point to the next statement. But because the last statement was a pseudo-op that generates no memory values, the PLC value remains at 100.
Because there is a label in this statement, we add it to the symbol table, taking its value from the current PLC value.
To process the next statement, we advance the PLC to point to the next line of the program and increment its value by the length in memory of the last line, namely, 4.
We continue this process as we scan the program until we reach the end, at which the state of the PLC and symbol table are as shown below.
Assemblers allow labels to be added to the symbol table without occupying space in the program memory. A typical name of this pseudo-op is EQU for equate. For example, in the code
ADD r0,r1,r2FOO EQU 5BAZ SUB r3,r4,#FOO
the EQU pseudo-op adds a label named FOO with the value 5 to the symbol table. The value of the BAZ label is the same as if the EQU pseudo-op were not present, since EQU does not advance the PLC. The new label is used in the subsequent SUB instruction as the name for a constant. EQUs can be used to define symbolic values to help make the assembly code more structured.
The ARM assembler supports one pseudo-op that is particular to the ARM instruction set. In other architectures, an address would be loaded into a register (e.g., for an indirect access) by reading it from a memory location. ARM does not have an instruction that can load an effective address, so the assembler supplies the ADR pseudo-op to create the address in the register.
It does so by using ADD or SUB instructions to generate the address. The address to be loaded can be register relative, program relative, or numeric, but it must assemble to a single instruction. More complicated address calculations must be explicitly programmed.
The assembler produces an object file that describes the instructions and data in binary format. A commonly used object file format, originally developed for Unix but now used in other environments as well, is known as COFF (common object file format). The object file must describe the instructions, data, and any addressing information and also usually carries along the symbol table for later use in debugging.
Generating relative code rather than absolute code introduces some new challenges to the assembly language process. Rather than using an ORG statement to provide the starting address, the assembly code uses a pseudo-op to indicate that the code is in fact relocatable. (Relative code is the default for both the ARM and SHARC assemblers.)
Similarly, we must mark the output object file as being relative code. We can initialize the PLC to 0 to denote that addresses are relative to the start of the file. However, when we generate code that makes use of those labels, we must be careful, since we do not yet know the actual value that must be put into the bits.
We must instead generate relocatable code. We use extra bits in the object file format to mark the relevant fields as relocatable and then insert the label's relative value into the field.
The linker must therefore modify the generated code - when it finds a field marked as relative, it uses the absolute addresses that it has generated to replace the relative value with a correct, absolute value for the address. To understand the details of turning relocatable code into absolute executable code, we must understand the linking process described in the next section.
Linking. Many assembly language programs are written as several smaller pieces rather than as a single large file. Breaking a large program into smaller files helps delineate program modularity. If the program uses library routines, those will already be preassembled, and assembly language source code for the libraries may not be available for purchase.
A linker allows a program to be stitched together out of several smaller pieces. The linker operates on the object files created by the assembler and modifies the assembled code to make the necessary links between files.
Some labels will be both defined and used in the same file. Other labels will be defined in a single file but used elsewhere as illustrated in Figure 5-11 below. The place in the file where a label is defined is known as an entry point. The place in the file where the label is used is called an external reference.
The main job of the loader is to resolve external references based on available entry points. As a result of the need to know how definitions and references connect, the assembler passes to the linker not only the object file but also the symbol table. Even if the entire symbol table is not kept for later debugging purposes, it must at least pass the entry points. External references are identified in the object code by their relative symbol identifiers.
Figure 5-11. External references and entry points.
The linker proceeds in two phases. First, it determines the absolute address of the start of each object file. The order in which object files are to be loaded is given by the user, either by specifying parameters when the loader is run or by creating a load map file that gives the order in which files are to be placed in memory.
Given the order in which files are to be placed in memory and the length of each object file, it is easy to compute the absolute starting address of each file. At the start of the second phase, the loader merges all symbol tables from the object files into a single, large table. It then edits the object files to change relative addresses into absolute addresses.
This is typically performed by having the assembler write extra bits into the object file to identify the instructions and fields that refer to labels. If a label cannot be found in the merged symbol table, it is undefined and an error message is sent to the user.
Controlling where code modules are loaded into memory is important in embedded systems. Some data structures and instructions, such as those used to manage interrupts, must be put at precise memory locations for them to work. In other cases, different types of memory may be installed at different address ranges. For example, if we have EPROM in some locations and DRAM in others, we want to make sure that locations to be written are put in the DRAM locations.
Workstations and PCs provide dynamically linked libraries, and certain sophisticated embedded computing environments may provide them as well. Rather than link a separate copy of commonly used routines such as I/O to every executable program on the system, dynamically linked libraries allow them to be linked in at the start of program execution.
A brief linking process is run just before execution of the program begins; the dynamic linker uses code libraries to link in the required routines. This not only saves storage space but also allows programs that use those libraries to be easily updated. However, it does introduce a delay before the program starts executing.
Next, in Part 3: Basic compilation techniques.
To read Part 1, go to "Program design and analysis."
Used with the permission of the publisher, Newnes/Elsevier, this series of six articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.
References:
[Dou99] Bruce Powell Douglas, Doing Hard Time: Developing Real Time Systems with UML. Addison Wesley, 1999.
[Chi94] M.Chiodo, et. al., "Hardware/software codesign of Embedded Systems," IEEE Micro, 1994
Copyright 2005 © CMP Media LLC
By Wayne Wolf, Embedded.com
Jul 31 2007 (0:15 AM)
URL: http://www.embedded.com/showArticle.jhtml?articleID=201201938
Where previously in Part 1 we covered the basics of program design and analysis and the usefulness of design patterns, in this part in this series of articles we develop models for programs that are more general than source code.
Why not use the source code directly? First, there are many different types of source code - assembly languages, C code, and so on - but we can use a single model to describe all of them. Second, once we have such a model, we can perform many useful analyses on the model more easily than we could on the source code.
Our fundamental model for programs is the control/data flow graph (CDFG). (We can also model hardware behavior with the CDFG.) As the name implies, the CDFG has constructs that model both data operations (arithmetic and other computations) and control operations (conditionals). Part of the power of the CDFG comes from its combination of control and data constructs. To understand the CDFG, we start with pure data descriptions and then extend the model to control.
Data Flow Graphs
A data flow graph is a model of a program with no conditionals. In a high-level programming language, a code segment with no conditionals—more precisely, with only one entry and exit point—is known as a basic block. Figure 5-3 below shows a simple basic block. As the C code is executed, we would enter this basic block at the beginning and execute all the statements.
w = a + b;
x = a - c;
y = x+ d;
x = a + c;
z= y + e;
Figure 5-3. A basic block in C
Before we are able to draw the data flow graph for this code we need to modify it slightly. There are two assignments to the variable x - it appears twice on the left side of an assignment. We need to rewrite the code in single-assignment form, in which a variable appears only once on the left side.
Since our specification is C code, we assume that the statements are executed sequentially, so that any use of a variable refers to its latest assigned value. In this case, x is not reused in this block (presumably it is used elsewhere), so we just have to eliminate the multiple assignment to x . The result is shown in Figure 5-4 below, where we have used the names x1 and x2 to distinguish the separate uses of x.
w = a + b;
x1 = a - c;
y = x1 + d;
x2 = a + c;
z = y + e;
Figure 5-4. The basic block in single-assignment form.
The single-assignment form is important because it allows us to identify a unique location in the code where each named location is computed. As an introduction to the data flow graph, we use two types of nodes in the graph— round nodes denote operators and square nodes represent values. The value nodes may be either inputs to the basic block, such as a and b , or variables assigned to within the block, such as w and x1 . The data flow graph for our single-assignment code is shown in Figure 5-5 below.
The single-assignment form means that the data flow graph is acyclic—if we assigned to x multiple times, then the second assignment would form a cycle in the graph including x and the operators used to compute x .
Keeping the data flow graph acyclic is important in many types of analyses we want to do on the graph. (Of course, it is important to know whether the source code actually assigns to a variable multiple times, because some of those assignments may be mistakes. We consider the analysis of source code for proper use of assignments later in this series.)
Figure 5-5. An extended data flow graph for our sample block
The data flow graph is generally drawn in the form shown in Figure 5-6 below. Here, the variables are not explicitly represented by nodes. Instead, the edges are labeled with the variables they represent. As a result, a variable can be represented by more than one edge. However, the edges are directed and all the edges for a variable must come from a single source. We use this form for its simplicity and compactness.
Figure 5-6. Standard data flow graph for sample basic block
The data flow graph for the code makes the order in which the operations are performed in the C code much less obvious. This is one of the advantages of the data flow graph. We can use it to determine feasible reorderings of the operations, which may help us to reduce pipeline or cache conflicts.
We can also use it when the exact order of operations simply doesn't matter. The data flow graph defines a partial ordering of the operations in the basic block. We must ensure that a value is computed before it is used, but generally there are several possible orderings of evaluating expressions that satisfy this requirement.
Control/Data Flow Graphs
A CDFG uses a data flow graph as an element, adding constructs to describe control. In a basic CDFG, we have two types of nodes: decision nodes and data flow nodes.
A data flow node encapsulates a complete data flow graph to represent a basic block. We can use one type of decision node to describe all the types of control in a sequential program. (The jump/branch is, after all, the way we implement all those high-level control constructs.)
Figure 5-7 below shows a bit of C code with control constructs and the CDFG constructed from it. The rectangular nodes in the graph represent the basic blocks. The basic blocks in the C code have been represented by function calls for simplicity. The diamond-shaped nodes represent the conditionals. The node's condition is given by the label, and the edges are labeled with the possible outcomes of evaluating the condition.
Building a CDFG for a while loop is straightforward, as shown in Figure 5-8 below. The while loop consists of both a test and a loop body, each of which we know how to represent in a CDFG. We can represent for loops by remembering that, in C, a for loop is defined in terms of a while loop. The following for loop
for (i = 0; i < N; i++) {
loop_body();
}
is equivalent to
if (cond1) basic_block_1();else basic_block_2();basic_block_3();switch (test1) { case c1: basic_block_4(); break; case c2: basic_block_5(); break; case c3: basic_block_6(): break;}
C Code
Figure 5-7. C code (above) and its CDFG (below)
--------------------------------------------------------------------------------
i=0;while (i
Figure 5-8. C code (above) and its CDFG (below) for a while loop.
Hierarchical representation. For a complete CDFG model, we can use a data flow graph to model each data flow node. Thus, the CDFG is a hierarchical representation—a data flow CDFG can be expanded to reveal a complete data flow graph. An execution model for a CDFG is very much like the execution of the program it represents. The CDFG does not require explicit declaration of variables, but we assume that the implementation has sufficient memory for all the variables.
We can define a state variable that represents a program counter in a CPU. (When studying a drawing of a CDFG, a finger works well for keeping track of the program counter state.) As we execute the program, we either execute the data flow node or compute the decision in the decision node and follow the appropriate edge, depending on the type of node the program counter points on.
Even though the data flow nodes may specify only a partial ordering on the data flow computations, the CDFG is a sequential representation of the program. There is only one program counter in our execution model of the CDFG, and operations are not executed in parallel.
The CDFG is not necessarily tied to high-level language control structures. We can also build a CDFG for an assembly language program. A jump instruction corresponds to a nonlocal edge in the CDFG. Some architectures, such as ARM and many VLIW processors, support predicated execution of instructions, which may be represented by special constructs in the CDFG.
Assembly and linking
Assembly and linking are the last steps in the compilation process—they turn a list of instructions into an image of the program's bits in memory. In this section, we survey the basic techniques required for assembly linking to help us understand the complete compilation process.
Figure 5-9 below highlights the role of assemblers and linkers in the compilation process. This process is often hidden from us by compilation commands that do everything required to generate an executable program. As the figure shows, most compilers do not directly generate machine code, but instead create the instruction-level program in the form of humanreadable assembly language.
Generating assembly language rather than binary instructions frees the compiler writer from details extraneous to the compilation process, which include the instruction format as well as the exact addresses of instructions and data. The assembler's job is to translate symbolic assembly language statements into bit-level representations of instructions known as object code.
The assembler takes care of instruction formats and does part of the job of translating labels into addresses. However, since the program may be built from many files, the final steps in determining the addresses of instructions and data are performed by the linker, which produces an executable binary file. That file may not necessarily be located in the CPU's memory, however, unless the linker happens to create the executable directly in RAM. The program that brings the program into memory for execution is called a loader.
Figure 5-9. Program generation from compilation through loading.
The simplest form of the assembler assumes that the starting address of the assembly language program has been specified by the programmer. The addresses in such a program are known as absolute addresses.
However, in many cases, particularly when we are creating an executable out of several component files, we do not want to specify the starting addresses for all the modules before assembly.
If we did, we would have to determine before assembly not only the length of each program in memory but also the order in which they would be linked into the program.
Most assemblers therefore allow us to use relative addresses by specifying at the start of the file that the origin of the assembly language module is to be computed later. Addresses within the module are then computed relative to the start of the module. The linker is then responsible for translating relative addresses into absolute addresses.
Assemblers. When translating assembly code into object code, the assembler must translate opcodes and format the bits in each instruction, and translate labels into addresses. In this section, we review the translation of assembly language into binary.
Labels make the assembly process more complex, but they are the most important abstraction provided by the assembler. Labels let the programmer (a human programmer or a compiler generating assembly code) avoid worrying about the absolute locations of instructions and data. Label processing requires making two passes through the assembly source code as follows:
1. The first pass scans the code to determine the address of each label.
2. The second pass assembles the instructions using the label values computed in the first pass.
As shown in Figure 5-10 below, the name of each symbol and its address is stored in a symbol table that is built during the first pass. The symbol table is built by scanning from the first instruction to the last. (For the moment, we assume that we know the absolute address of the first instruction in the program; we consider the general case later in this series).
During scanning, the current location in memory is kept in a program location counter (PLC). Despite the similarity in name to a program counter, the PLC is not used to execute the program, only to assign memory locations to labels. For example, the PLC always makes exactly one pass through the program, whereas the program counter makes many passes over code in a loop.
Thus, at the start of the first pass, the PLC is set to the program's starting address and the assembler looks at the first line. After examining the line, the assembler updates the PLC to the next location (since ARM instructions are four bytes long, the PLC would be incremented by four) and looks at the next instruction.
If the instruction begins with a label, a new entry is made in the symbol table, which includes the label name and its value. The value of the label is equal to the current value of the PLC. At the end of the first pass, the assembler rewinds to the beginning of the assembly language file to make the second pass. During the second pass, when a label name is found, the label is looked up in the symbol table and its value substituted into the appropriate place in the instruction.
Figure 5-10. Symbol table processing during assembly.
But how do we know the starting value of the PLC? The simplest case is absolute addressing. In this case, one of the first statements in the assembly language program is a pseudo-op that specifies the origin of the program, that is, the location of the first address in the program. A common name for this pseudo-op (e.g., the one used for the ARM) is the ORG statement
ORG 2000
which puts the start of the program at location 2000. This pseudo-op accomplishes this by setting the PLC's value to its argument's value, 2000 in this case. Assemblers generally allow a program to have many ORG statements in case instructions or data must be spread around various spots in memory. Example 5-1 below illustrates the use of the PLC in generating the symbol table.
Example 5-1: Generating a Symbol Table
Let's use the following simple example of ARM assembly code:
ORG 100 label1 ADR r4,c LDR r0,[r4] label2 ADR r4,d LDR r1,[r4] label3 SUB r0,r0,r1
The initial ORG statement tells us the starting address of the program. To begin, let's initialize the symbol table to an empty state and put the PLC at the initial ORG statement.
The PLC value shown is at the beginning of this step, before we have processed the ORG statement. The ORG tells us to set the PLC value to 100.
To process the next statement, we move the PLC to point to the next statement. But because the last statement was a pseudo-op that generates no memory values, the PLC value remains at 100.
Because there is a label in this statement, we add it to the symbol table, taking its value from the current PLC value.
To process the next statement, we advance the PLC to point to the next line of the program and increment its value by the length in memory of the last line, namely, 4.
We continue this process as we scan the program until we reach the end, at which the state of the PLC and symbol table are as shown below.
Assemblers allow labels to be added to the symbol table without occupying space in the program memory. A typical name of this pseudo-op is EQU for equate. For example, in the code
ADD r0,r1,r2FOO EQU 5BAZ SUB r3,r4,#FOO
the EQU pseudo-op adds a label named FOO with the value 5 to the symbol table. The value of the BAZ label is the same as if the EQU pseudo-op were not present, since EQU does not advance the PLC. The new label is used in the subsequent SUB instruction as the name for a constant. EQUs can be used to define symbolic values to help make the assembly code more structured.
The ARM assembler supports one pseudo-op that is particular to the ARM instruction set. In other architectures, an address would be loaded into a register (e.g., for an indirect access) by reading it from a memory location. ARM does not have an instruction that can load an effective address, so the assembler supplies the ADR pseudo-op to create the address in the register.
It does so by using ADD or SUB instructions to generate the address. The address to be loaded can be register relative, program relative, or numeric, but it must assemble to a single instruction. More complicated address calculations must be explicitly programmed.
The assembler produces an object file that describes the instructions and data in binary format. A commonly used object file format, originally developed for Unix but now used in other environments as well, is known as COFF (common object file format). The object file must describe the instructions, data, and any addressing information and also usually carries along the symbol table for later use in debugging.
Generating relative code rather than absolute code introduces some new challenges to the assembly language process. Rather than using an ORG statement to provide the starting address, the assembly code uses a pseudo-op to indicate that the code is in fact relocatable. (Relative code is the default for both the ARM and SHARC assemblers.)
Similarly, we must mark the output object file as being relative code. We can initialize the PLC to 0 to denote that addresses are relative to the start of the file. However, when we generate code that makes use of those labels, we must be careful, since we do not yet know the actual value that must be put into the bits.
We must instead generate relocatable code. We use extra bits in the object file format to mark the relevant fields as relocatable and then insert the label's relative value into the field.
The linker must therefore modify the generated code - when it finds a field marked as relative, it uses the absolute addresses that it has generated to replace the relative value with a correct, absolute value for the address. To understand the details of turning relocatable code into absolute executable code, we must understand the linking process described in the next section.
Linking. Many assembly language programs are written as several smaller pieces rather than as a single large file. Breaking a large program into smaller files helps delineate program modularity. If the program uses library routines, those will already be preassembled, and assembly language source code for the libraries may not be available for purchase.
A linker allows a program to be stitched together out of several smaller pieces. The linker operates on the object files created by the assembler and modifies the assembled code to make the necessary links between files.
Some labels will be both defined and used in the same file. Other labels will be defined in a single file but used elsewhere as illustrated in Figure 5-11 below. The place in the file where a label is defined is known as an entry point. The place in the file where the label is used is called an external reference.
The main job of the loader is to resolve external references based on available entry points. As a result of the need to know how definitions and references connect, the assembler passes to the linker not only the object file but also the symbol table. Even if the entire symbol table is not kept for later debugging purposes, it must at least pass the entry points. External references are identified in the object code by their relative symbol identifiers.
Figure 5-11. External references and entry points.
The linker proceeds in two phases. First, it determines the absolute address of the start of each object file. The order in which object files are to be loaded is given by the user, either by specifying parameters when the loader is run or by creating a load map file that gives the order in which files are to be placed in memory.
Given the order in which files are to be placed in memory and the length of each object file, it is easy to compute the absolute starting address of each file. At the start of the second phase, the loader merges all symbol tables from the object files into a single, large table. It then edits the object files to change relative addresses into absolute addresses.
This is typically performed by having the assembler write extra bits into the object file to identify the instructions and fields that refer to labels. If a label cannot be found in the merged symbol table, it is undefined and an error message is sent to the user.
Controlling where code modules are loaded into memory is important in embedded systems. Some data structures and instructions, such as those used to manage interrupts, must be put at precise memory locations for them to work. In other cases, different types of memory may be installed at different address ranges. For example, if we have EPROM in some locations and DRAM in others, we want to make sure that locations to be written are put in the DRAM locations.
Workstations and PCs provide dynamically linked libraries, and certain sophisticated embedded computing environments may provide them as well. Rather than link a separate copy of commonly used routines such as I/O to every executable program on the system, dynamically linked libraries allow them to be linked in at the start of program execution.
A brief linking process is run just before execution of the program begins; the dynamic linker uses code libraries to link in the required routines. This not only saves storage space but also allows programs that use those libraries to be easily updated. However, it does introduce a delay before the program starts executing.
Next, in Part 3: Basic compilation techniques.
To read Part 1, go to "Program design and analysis."
Used with the permission of the publisher, Newnes/Elsevier, this series of six articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.
References:
[Dou99] Bruce Powell Douglas, Doing Hard Time: Developing Real Time Systems with UML. Addison Wesley, 1999.
[Chi94] M.Chiodo, et. al., "Hardware/software codesign of Embedded Systems," IEEE Micro, 1994
Copyright 2005 © CMP Media LLC
The basics of programming embedded processors: Part 1
http://www.embedded.com/showArticle.jhtml?articleID=201200638
Wayne Wolf, Embedded.com Jul 24 2007 (1:00 AM
Designing and implementing embedded programs is different and more challenging than writing typical workstation or PC programs. Embedded code must not only provide rich functionality, it must also often run at a required rate to meet system deadlines, fit into the allowed amount of memory, and meet power consumption requirements.
Designing code that simultaneously meets multiple design constraints is a considerable challenge, but luckily there are techniques and tools that we can use to help us through the design process. Making sure that the program works is also a challenge, but once again methods and tools come to our aid.
Presented in this series of six articles we will contentrate on high-level programming languages, specifically the C language. High-level languages were once shunned as too inefficient for embedded microcontrollers, but better compilers, more compilerfriendly architectures, and faster processors and memory have made highlevel language programs common.
Some sections of a program may still need to be written in assembly language if the compiler doesn't give sufficiently good results, but even when coding in assembly language it is often helpful to think about the program's functionality in high-level form. Many of the analysis and optimization techniques that we study in this chapter are equally applicable to programs written in assembly language.
Future parts in this series will discuss (1) the control/data flow graph as a model for high-level language programs (which can also be applied to programs written originally in assembly language) with a particular focus on design patterns; (2) the assembly and linking process; (3) the basic steps in compilation; (4) optimization techniques specific to embedded computing for energy consumption, performance and size.Design PatternsA design pattern is a generalized description of a way to solve a certain class of problems. As a simple example, we could write C code for one implementation of a linked list, but that code would set in concrete the data items available in the list, actions on errors, and so on. A design pattern describing the list mechanism would capture the essential components and behaviors of the list without adding unnecessary detail. A design pattern can be described in the Unified Modeling Language (UML); it usually takes the form of a collaboration diagram, which shows how classes work together to perform a given function.
Figure 5-1 below shows a simple description of a design pattern as a UML class diagram. The diagram defines two classes: List to describe the entire list and List-element for one of the items in the list. The List class defines the basic operations that you want to do on a list.
The details of what goes in the list and so forth can be easily added into this design pattern. A design pattern can be parameterized so that it can be customized to the needs of a particular application. A more complete description of the pattern might include
state diagrams to describe behavior and
sequence diagrams to show how classes interact.
Figure 5-1. A simple description of a design pattern
Design patterns are primarily intended to help solve midlevel design challenges. A design pattern may include only a single class, but it usually describes a handful of classes.
Design patterns rarely include more than a few dozen classes. A design pattern probably will not provide you with the complete architecture of your system, but it can provide you with the architectures for many subsystems in your design:
By stitching together and specializing existing design patterns, you may be able to quickly create a large part of your system architecture.
Design patterns are meant to be used in ways similar to how engineers in other disciplines work. A designer can consult catalogs of design patterns to find patterns that seem to fit a particular design problem.
The designer can then choose parameters suited to the application and see what that implies for the implementation of the design pattern. The designer can then choose the design pattern that seems to be the best match for the design, parameterize it, and instantiate it.
Design patterns can be of many different types. A few are listed below.
The digital filter is easily described as a design pattern.
Data structures and their associated actions can be described as design patterns.
A reactive system that reacts to external stimuli can be described as a design pattern, leaving the exact state transition diagram as a parameter.
Douglass [Dou99] describes a policy class that describes a protocol that can be used to implement a variety of policies.
Design Patterns for Embedded SystemsIn this section, we consider design patterns for two very different styles of programs: the state machine and the circular buffer. State machines are well suited to reactive systems such as user interfaces, and circular buffers are useful in digital signal processing.
State machine style. When inputs appear intermittently rather than as periodic samples, it is often convenient to think of the system as reacting to those inputs. The reaction of most systems can be characterized in terms of the input received and the current state of the system. This leads naturally to a finite-state machine style of describing the reactive system's behavior.
Moreover, if the behavior is specified in that way, it is natural to write the program implementing that behavior in a state machine style. The state machine style of programming is also an efficient implementation of such computations.
Finite-state machines are usually first encountered in the context of hardware design.The programming example describe below shows how to write a finite-state machine in a high-level programming language.
Programming Example: A state machine in CThe behavior we want to implement is a simple seat belt controller [Chi94]. The controller's job is to turn on a buzzer if a person sits in a seat and does not fasten the seat belt within a fixed amount of time. This system has three inputs and one output.
The inputs are a sensor for the seat to know when a person has sat down, a seat belt sensor that tells when the belt is fastened, and a timer that goes off when the required time interval has elapsed. The output is the buzzer. Shown below is a state diagram that describes the seat belt controller's behavior.
The idle state is in force when there is no person in the seat. When the person sits down, the machine goes into the seated state and turns on the timer. If the timer goes off before the seat belt is fastened, the machine goes into the buzzer state. If the seat belt goes on first, it enters the belted state. When the person leaves the seat, the machine goes back to idle.
To write this behavior in C, we will assume that we have loaded the current values of all three inputs ( seat , belt , timer ) into variables and will similarly hold the outputs in variables temporarily ( timer_on , buzzer_on ). We will use a variable named state to hold the current state of the machine and a switch statement to determine what action to take in each state. The code follows: #define IDLE 0 #define SEATED 1 #define BELTED 2 #define BUZZER 3
This code takes advantage of the fact that the state will remain the same unless explicitly changed; this makes self-loops back to the same state easy to implement.
This state machine may be executed forever in a while(TRUE) loop or periodically called by some other code. In either case, the code must be executed regularly so that it can check on the current value of the inputs and, if necessary, go into a new state.
Data stream style. The data stream style makes sense for data that comes in regularly and must be processed on the fly. The FIR filter of the example shown above is a classic example of stream-oriented processing. For each sample, the filter must emit one output that depends on the values of the last n inputs.
In a typical workstation application, we would process the samples over a given interval by reading them all in from a file and then computing the results all at once in a batch process. In an embedded system we must not only emit outputs in real time, but we must also do so using a minimum amount of memory.
The circular buffer is a data structure that lets us handle streaming data in an efficient way. Figure 5-2 below illustrates how a circular buffer stores a subset of the data stream. At each point in time, the algorithm needs a subset of the data stream that forms a window into the stream.
The window slides with time as we throw out old values no longer needed and add new values. Since the size of the window does not change, we can use a fixed-size buffer to hold the current data.
Figure 5-2. A circular buffer for streaming data
To avoid constantly copying data within the buffer, we will move the head of the buffer in time. The buffer points to the location at which the next sample will be placed; every time we add a sample, we automatically overwrite the oldest sample, which is the one that needs to be thrown out.
When the pointer gets to the end of the buffer, it wraps around to the top. Described below is an example of an efficient implementation of a circular buffer.
Programming Example: A circular buffer for an FIR filterAppearing below are the declarations for the circular buffer and filter coefficients, assuming that N , the number of taps in the filter, has been previously defined. int circ_buffer[N]; /* circular buffer for data */ int circ_buffer_head = 0; /* current head of the buffer */ int c[N]; /* filter coefficients (constants) */
To write C code for a circular buffer-based FIR filter, we need to modify the original loop slightly. Because the 0th element of data may not be in the 0th element of the circular buffer, we have to change the way in which we access the data. One of the implications of this is that we need separate loop indices for the circular buffer and coefficients.
The above code assumes that some other code, such as an interrupt handler, is replacing the last element of the circular buffer at the appropriate times. The statement 1buff = (ibuff == (N " 1) ? 0 : ibuff++) is a shorthand C way of incrementing ibuff such that it returns to 0 after reaching the end of the circular buffer array.
Next in Part 2: Models of programs.
Used with the permission of the publisher, Newnes/Elsevier, this series of six articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.
References:[Dou99] Bruce Powell Douglas, Doing Hard Time: Developing Real Time Systems with UML. Addison Wesley, 1999.[Chi94] M.Chiodo, et. al., "Hardware/software codesign of Embedded Systems," IEEE Micro, 1994
Copyright 2005 © CMP Media LLC
Wayne Wolf, Embedded.com Jul 24 2007 (1:00 AM
Designing and implementing embedded programs is different and more challenging than writing typical workstation or PC programs. Embedded code must not only provide rich functionality, it must also often run at a required rate to meet system deadlines, fit into the allowed amount of memory, and meet power consumption requirements.
Designing code that simultaneously meets multiple design constraints is a considerable challenge, but luckily there are techniques and tools that we can use to help us through the design process. Making sure that the program works is also a challenge, but once again methods and tools come to our aid.
Presented in this series of six articles we will contentrate on high-level programming languages, specifically the C language. High-level languages were once shunned as too inefficient for embedded microcontrollers, but better compilers, more compilerfriendly architectures, and faster processors and memory have made highlevel language programs common.
Some sections of a program may still need to be written in assembly language if the compiler doesn't give sufficiently good results, but even when coding in assembly language it is often helpful to think about the program's functionality in high-level form. Many of the analysis and optimization techniques that we study in this chapter are equally applicable to programs written in assembly language.
Future parts in this series will discuss (1) the control/data flow graph as a model for high-level language programs (which can also be applied to programs written originally in assembly language) with a particular focus on design patterns; (2) the assembly and linking process; (3) the basic steps in compilation; (4) optimization techniques specific to embedded computing for energy consumption, performance and size.Design PatternsA design pattern is a generalized description of a way to solve a certain class of problems. As a simple example, we could write C code for one implementation of a linked list, but that code would set in concrete the data items available in the list, actions on errors, and so on. A design pattern describing the list mechanism would capture the essential components and behaviors of the list without adding unnecessary detail. A design pattern can be described in the Unified Modeling Language (UML); it usually takes the form of a collaboration diagram, which shows how classes work together to perform a given function.
Figure 5-1 below shows a simple description of a design pattern as a UML class diagram. The diagram defines two classes: List to describe the entire list and List-element for one of the items in the list. The List class defines the basic operations that you want to do on a list.
The details of what goes in the list and so forth can be easily added into this design pattern. A design pattern can be parameterized so that it can be customized to the needs of a particular application. A more complete description of the pattern might include
state diagrams to describe behavior and
sequence diagrams to show how classes interact.
Figure 5-1. A simple description of a design pattern
Design patterns are primarily intended to help solve midlevel design challenges. A design pattern may include only a single class, but it usually describes a handful of classes.
Design patterns rarely include more than a few dozen classes. A design pattern probably will not provide you with the complete architecture of your system, but it can provide you with the architectures for many subsystems in your design:
By stitching together and specializing existing design patterns, you may be able to quickly create a large part of your system architecture.
Design patterns are meant to be used in ways similar to how engineers in other disciplines work. A designer can consult catalogs of design patterns to find patterns that seem to fit a particular design problem.
The designer can then choose parameters suited to the application and see what that implies for the implementation of the design pattern. The designer can then choose the design pattern that seems to be the best match for the design, parameterize it, and instantiate it.
Design patterns can be of many different types. A few are listed below.
The digital filter is easily described as a design pattern.
Data structures and their associated actions can be described as design patterns.
A reactive system that reacts to external stimuli can be described as a design pattern, leaving the exact state transition diagram as a parameter.
Douglass [Dou99] describes a policy class that describes a protocol that can be used to implement a variety of policies.
Design Patterns for Embedded SystemsIn this section, we consider design patterns for two very different styles of programs: the state machine and the circular buffer. State machines are well suited to reactive systems such as user interfaces, and circular buffers are useful in digital signal processing.
State machine style. When inputs appear intermittently rather than as periodic samples, it is often convenient to think of the system as reacting to those inputs. The reaction of most systems can be characterized in terms of the input received and the current state of the system. This leads naturally to a finite-state machine style of describing the reactive system's behavior.
Moreover, if the behavior is specified in that way, it is natural to write the program implementing that behavior in a state machine style. The state machine style of programming is also an efficient implementation of such computations.
Finite-state machines are usually first encountered in the context of hardware design.The programming example describe below shows how to write a finite-state machine in a high-level programming language.
Programming Example: A state machine in CThe behavior we want to implement is a simple seat belt controller [Chi94]. The controller's job is to turn on a buzzer if a person sits in a seat and does not fasten the seat belt within a fixed amount of time. This system has three inputs and one output.
The inputs are a sensor for the seat to know when a person has sat down, a seat belt sensor that tells when the belt is fastened, and a timer that goes off when the required time interval has elapsed. The output is the buzzer. Shown below is a state diagram that describes the seat belt controller's behavior.
The idle state is in force when there is no person in the seat. When the person sits down, the machine goes into the seated state and turns on the timer. If the timer goes off before the seat belt is fastened, the machine goes into the buzzer state. If the seat belt goes on first, it enters the belted state. When the person leaves the seat, the machine goes back to idle.
To write this behavior in C, we will assume that we have loaded the current values of all three inputs ( seat , belt , timer ) into variables and will similarly hold the outputs in variables temporarily ( timer_on , buzzer_on ). We will use a variable named state to hold the current state of the machine and a switch statement to determine what action to take in each state. The code follows: #define IDLE 0 #define SEATED 1 #define BELTED 2 #define BUZZER 3
This code takes advantage of the fact that the state will remain the same unless explicitly changed; this makes self-loops back to the same state easy to implement.
This state machine may be executed forever in a while(TRUE) loop or periodically called by some other code. In either case, the code must be executed regularly so that it can check on the current value of the inputs and, if necessary, go into a new state.
Data stream style. The data stream style makes sense for data that comes in regularly and must be processed on the fly. The FIR filter of the example shown above is a classic example of stream-oriented processing. For each sample, the filter must emit one output that depends on the values of the last n inputs.
In a typical workstation application, we would process the samples over a given interval by reading them all in from a file and then computing the results all at once in a batch process. In an embedded system we must not only emit outputs in real time, but we must also do so using a minimum amount of memory.
The circular buffer is a data structure that lets us handle streaming data in an efficient way. Figure 5-2 below illustrates how a circular buffer stores a subset of the data stream. At each point in time, the algorithm needs a subset of the data stream that forms a window into the stream.
The window slides with time as we throw out old values no longer needed and add new values. Since the size of the window does not change, we can use a fixed-size buffer to hold the current data.
Figure 5-2. A circular buffer for streaming data
To avoid constantly copying data within the buffer, we will move the head of the buffer in time. The buffer points to the location at which the next sample will be placed; every time we add a sample, we automatically overwrite the oldest sample, which is the one that needs to be thrown out.
When the pointer gets to the end of the buffer, it wraps around to the top. Described below is an example of an efficient implementation of a circular buffer.
Programming Example: A circular buffer for an FIR filterAppearing below are the declarations for the circular buffer and filter coefficients, assuming that N , the number of taps in the filter, has been previously defined. int circ_buffer[N]; /* circular buffer for data */ int circ_buffer_head = 0; /* current head of the buffer */ int c[N]; /* filter coefficients (constants) */
To write C code for a circular buffer-based FIR filter, we need to modify the original loop slightly. Because the 0th element of data may not be in the 0th element of the circular buffer, we have to change the way in which we access the data. One of the implications of this is that we need separate loop indices for the circular buffer and coefficients.
The above code assumes that some other code, such as an interrupt handler, is replacing the last element of the circular buffer at the appropriate times. The statement 1buff = (ibuff == (N " 1) ? 0 : ibuff++) is a shorthand C way of incrementing ibuff such that it returns to 0 after reaching the end of the circular buffer array.
Next in Part 2: Models of programs.
Used with the permission of the publisher, Newnes/Elsevier, this series of six articles is based on copyrighted material from "Computers as Components: Principles of Embedded Computer System Design" by Wayne Wolf. The book can be purchased on line.
Wayne Wolf is currently the Georgia Research Alliance Eminent Scholar holding the Rhesa "Ray" S. Farmer, Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech's School of Electrical and Computer Engineering (ECE). Previously a professor of electrical engineering at Princeton University, he worked at AT&T Bell Laboratories. He has served as editor in chief of the ACM Transactions on Embedded Computing and of Design Automation for Embedded Systems.
References:[Dou99] Bruce Powell Douglas, Doing Hard Time: Developing Real Time Systems with UML. Addison Wesley, 1999.[Chi94] M.Chiodo, et. al., "Hardware/software codesign of Embedded Systems," IEEE Micro, 1994
Copyright 2005 © CMP Media LLC
Subscribe to:
Comments (Atom)
